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1.0  Executive  Summary 


The  ultimate  goal  of  the  PROCEED  AHEAD  project  was  to  advance  the  study  and  design  of 
flexible  and  efficient  techniques  for  processing  encrypted  data,  outsourcing  computation  and 
adding  robustness  to  cryptographic  computations.  Examples  of  problems  that  would  benefit  from 
such  techniques  are  the  verification  of  policy  compliance  on  encrypted  data,  spam  detection  on 
encrypted  data,  efficient  delegation  of  computation  and  more.  The  contributions  of  this  program 
are  already  and  will  continue  to  improve  the  ability  to  process  and  manage  data,  in  particular 
encrypted  data,  in  a  robust  and  fully  functional  way  without  sacrificing  the  confidentiality  of 
information  and  the  privacy  of  users. 

We  have  provided  an  implementation  of  Fully  Homomorphic  Encryption  (FHE).  The  main 
goal  of  the  implementation  was  to  help  us  evaluate  the  different  algorithmic  approaches,  and  to 
innovate  on  implementation  techniques  to  bypass  bottlenecks  in  the  proposed  schemes.  We 
stress  that  building  a  usable  implementation  is  not  just  about  programming  tricks,  but  rather 
mathematical  innovations  that  improve  the  running  times  in  practice.  These  improvements 
resulted  in  large  constant  factor  speed-ups  that  are  ignored  in  the  asymptotic  calculations  of 
theoretical  research  papers,  but  are  essential  for  a  practical  implementation.  Our  approach  to 
making  fully  homomorphic  encryption  more  efficient  using  speed-ups  improved  performance  by 
several  orders  of  magnitude.  These  improvements  come  from  modifications  to  the  basic  schemes 
of  Gentry  and  others  that  have  little  asymptotic  impact,  but  have  a  large  impact  in  practice. 

An  attractive  approach  to  address  issues  of  privacy  is  to  resort  to  the  area  of  secure  multiparty 
computations  (SMC)  as  extensively  studied  in  cryptography.  However  previous  solutions  in  this 
area  are  mostly  theoretical,  and  are  hardly  used  in  practice  due  to  their  complexity.  To  address 
this  we  have  proposed  a  novel  framework  for  the  design  and  analysis  of  secure  computation 
protocols  which  allows  for  much  simpler  modeling  and  analysis  of  cryptographic  protocols. 
Outsourcing  computations  has  also  previously  suffered  from  the  lack  of  efficient  algorithms.  In 
addition,  we  created  algorithms  for  delegating  computation  and  spam  filters. 

During  just  the  first  two  years  of  the  project  the  PROCEED  AHEAD  team  has  developed  and 
implemented  new  and  improved  protocols  for  computing  on  encrypted  data,  and  a  deepened 
understanding  of  the  foundations  of  secure  computation.  The  report  that  follows  describes  the 
work  of  IBM,  Stanford  University,  and  University  of  California,  San  Diego  on  the  PROCEED 
AHEAD  project  from  February  2011  through  March  2013. 
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2.0  Introduction  and  Program  Overview 


2.1  Background 

Homomorphic  Encryption  is  a  form  of  encryption  where  a  specified  algebraic  operation  is 
performed  on  the  plaintext  and  another  (possibly  different)  algebraic  operation  is  performed  on 
the  ciphertext.  There  are  several  forms  of  homomorphic  encryption  that  allow  an  addition  or 
multiplication  operation  on  the  plaintext,  but  to  preserve  the  ring  structure  of  plaintext  both 
addition  and  multiplication  operations  must  be  supported.  Using  these  methods,  any  circuit  could 
be  homomorphically  evaluated,  effectively  allowing  the  construction  of  programs  which  may  be 
run  on  encryptions  of  their  inputs  to  produce  an  encryption  of  their  output.  Since  the  program 
would  never  decrypt  its  input,  it  could  be  run  by  an  untrusted  party,  or  transmitted  over  an 
untrusted  media,  without  revealing  its  inputs  or  internal  state. 

The  utility  of  this  scheme  is  well  known,  but  the  algorithms  and  computation  complexity  of 
current  implementations  are  burdensome.  To  be  useful  an  efficient  scheme  must  be  developed 
and  integrated  into  modem  computing. 

2.1.1  The  PROCEED  PROGRAM 

PROgramming  Computation  on  EncryptEd  Data  (PROCEED)  is  a  program  focused  on  creating 
practical  methods  for  computing  on  encrypted  data  and  is  made  up  of  the  six  Technical  Areas 
(TA)  listed  below.  The  PROCEED  AHEAD  program  which  we  report  on  here  covers  Technical 
Areas  2,  3,  and  4  only. 

■  TA1 .  Mathematical  Foundations  of  Fully  Homomorphic  Encryption 

■  TA2.  Mathematical  Foundations  of  Computation  on  Encrypted  Data  via  Secure  Multiparty 
Computation 

■  TA3.  Mathematical  Foundations  of  Supporting  Security  Technologies 

■  TA4.  Implementation/Measurement/Optimization  of  Homomorphic  Cryptography  and 
Secure  Multiparty  Protocols 

■  TA5.  Algorithms  for  Computation  on  Encrypted  Data 

■  TA6.  Programming  Languages 

The  scope  of  the  PROCEED  effort  is  to  design,  develop,  evaluate,  integrate,  demonstrate  and 
deliver:  new  mathematical  foundations  for  efficient  secure  multiparty  computation;  new 
mathematical  foundations  for  efficient  computation  on  encrypted  data  and  supporting 
technologies/techniques;  implementations  of  known  and  new  schemes/protocols,  measure  and 
optimize  these  implementations;  develop  libraries  of  efficient  algorithms  and  data  structures; 
develop  new  programming  languages  and  accompanying  compilers  and;  provide  input  to  the 
Integration  Contractor  for  the  development  of  a  common  Application  Programmers  Interface 
(API)  and  integration  and  evaluation  of  the  research  areas. 
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The  PROCEED  program  efforts  are  broken  up  into  four  program  phases  as  described 
below. 

■  Program  Phase  I  Initial  Capabilities.  Phase  I  will  focus  on  developing  the  initial  capabilities 
for  the  mathematical  foundations,  measurement,  algorithms  for  computation,  and 
programming  languages.  An  API  will  be  developed  and  coordinated  across  all  performers  at 
the  first  Principal  Investigator  (PI)  meeting. 

■  Program  Phase  II  Alpha.  Phase  II  will  focus  on  development  of  alpha  quality 
implementations  of  algorithms,  optimized  implementations,  programming  languages,  an 
initial  demonstration  of  remote  regular  expression  matching  and  spam  filter,  and  a  refined 
definition  of  program  metrics. 

■  Program  Phase  III  Beta.  Phase  III  will  focus  on  the  beta  development  of  core  algorithms, 
interoperability  integration,  and  development  of  optimized  implementations  tested  against 
defined  program  metrics. 

■  Program  Phase  IV  Research  Prototype.  Phase  IV  will  focus  on  the  development  of  the 
research  prototype  and  embedded  application  prototypes.  Final  demonstrations  will  include  a 
functional  spam  filter  prototype. 

2.1.2  The  AHEAD  PROJECT 

Advancing  Homomorphic  Encryption  its  Applications  and  Derivatives  (AHEAD)  is  a  sub- 
project  within  the  broader  PROCEED  program  focused  on  conducting  research  in  Technical 
Areas  2,  3,  and  4.  The  AHEAD  research  team  has  an  extensive  background  in  secure  multiparty 
computation  and  homomorphic  Encryption  made  up  of  IBM  Research’s  Cryptography  Group 
(Prime),  and  the  Departments  of  Computer  Science  at  Stanford  University  and  the  University  of 
California,  San  Diego  (UCSD).  The  AHEAD  research  team  has  worked  to  advance  the  study  and 
design  of  flexible  and  efficient  techniques  for  processing  encrypted  data,  outsourcing 
computation  and  to  add  robustness  to  cryptographic  computations  including  protection  against 
accidental  or  malicious  leakage  of  secret  information.  Examples  of  problems  that  will  benefit 
from  our  work  are  the  verification  of  policy  compliance  on  encrypted  data,  spam  detection  on 
encrypted  data,  efficient  delegation  of  computation,  leakage  resilient  computation  and  more. 
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3.0  Program  Work:  Methods  and  Assumptions 


3.1  Work  Plans  and  Methods 

We  outline  below  the  planned  work  to  be  accomplished  within  the  three  PROCEED 
program  Technical  Areas  covered  by  AHEAD  as  defined  at  the  onset  of  the  program.  Program 
work  has  been  carried  out  jointly  between  IBM  Research,  Stanford  University,  and  UCSD.  In 
addition  to  the  technical  work,  the  AHEAD  research  team  has  participated  in  DARPA  PI 
meetings  and  contributed  to  related  DARPA  events.  All  program  results  have  been  made 
available  on  the  project  website  hosted  by  the  PROCEED  program  integrator,  Galois. 

3.1.1  TA  2:  Foundations  of  secure  computations 

■  Programming  models  for  secure  computation.  Extend  Yao’s  garbled  circuits  to  handle 
arithmetic  functions  efficiently.  Design  protocols  for  repeated  executions  and  for  programs 
with  loops. 

■  Develop  relations  among  different  execution  models  and  construct  general  transformations 
for  transferring  desirable  protocol  properties  from  one  model  to  another. 

■  Design  dedicated  solutions  for  problems,  e.g.,  pattern  matching. 

3.1.2  TA  3:  Foundations  of  supporting  security  technologies 

■  Design  homomorphic  encryption  for  certain  function  families  to  enable  restricted 
computation  delegation. 

■  Verifying  computation.  Remove  the  need  for  FHE  to  delegate  computation.  Introduce  proxy 
re-signatures  in  order  to  control  malicious  servers. 

■  Prevent  side  channel  attacks  using  leakage  resilience.  Attempt  to  remove  the  reliance  on 
leakage  resilient  hardware. 

3.1.3  TA  4:  Implementing  fully  homomorphic  encryption 

■  Optimize  Fully  Homomorphic  Encryption  by  speeding  up  key  generation  and  encryption. 
Shrink  the  public  key  and  ciphertext  size. 

■  Explore  fast  two-party  computation  via  fast  Yao  Circuits 

3.2  Deliverables  and  Assumptions 

Through  the  course  of  the  planned  four  year  PROCEED  AHEAD  program  IBM,  Stanford 

University  and  UCSD  planned  to  deliver  the 

following: 

■  An  optimized  implementation  of  fully  homomorphic  encryption. 

■  For  all  three  tasks  we  delivered  white  papers  and  technical  papers  describing  the  results  of 
the  work. 

■  As  we  made  progress  on  the  theoretical  underpinnings  of  these  tasks  we  experimented  with 
prototype  implementations  when  appropriate. 
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Towards  the  end  of  the  project  we  initiated  technology  transfer  discussions  with  product 
groups  within  IBM  and  devoted  resources  to  supporting  a  successful  transition  to  real-world 
products.  Completion  criteria  for  all  three  tasks  was  planned  to  be  the  development  of  the 
required  cryptographic  systems  along  with  theoretical  optimizations  and  prototype  software  if 
appropriate. 

The  following  is  an  outline  of  the  planned  schedule  for  the  PROCEED  AHEAD  program. 
When  the  schedule  for  program  work  and  deliverables  was  defined,  it  was  assumed  that  IBM, 
Stanford,  and  USCD  would  all  jointly  contribute  to  program  efforts  for  the  duration  of  the 
planned  four  year  program.  This  final  report  covers  year  one  and  year  two  contributions  by  the 
AHEAD  team.  At  the  close  of  year  two  IBM  took  leave  of  the  program.  Stanford  and  UCSD 
have  continued  their  work  under  a  new  contract  put  in  place  with  the  Office  of  Naval  Research 
and  DARPA. 

Year  1:  Develop  initial  protocols  for  pattern  matching  on  encrypted  data.  Begin  investigating 
improved  2-party  computation  techniques  of  the  tasks  outlines  in  the  proposal.  Experiment  with 
optimizations  to  our  fully  homomorphic  encryption  implementation. 

Year  2:  Continue  developing  our  techniques  for  2-party  computation.  Examine  multiple 
executions  of  the  same  protocol  in  a  multiparty  setting.  Tune  protocols  for  pattern  matching  to 
the  specific  tasks  of  network  guards  and  mail  filtering  on  encrypted  data.  Continue 
experimenting  with  optimizations  and  modifications  to  the  fully  homomorphic  system  based  on 
the  results  for  year  1.  Begin  investigating  the  problem  of  delegating  computation.  Write  and 
publish  technical  papers  on  intermediate  results  for  all  three  tasks. 

After  Year  2  the  program  was  cancelled.  The  Year  3  and  4  planned  work  is  outlined  below  but 
was  not  completed. 

Year  3:  Build  on  our  work  from  year  2  to  and  begin  prototyping  an  application  for  our  2-party 
protocols,  extend  the  investigation  to  include  loops.  Work  on  additional  aspects  of  computation 
delegation.  Use  our  optimized  fully  homomorphic  encryption  scheme  for  real  world  tasks  such  as 
network  guards  and  other  computations  on  encrypted  data  such  as  curve  fitting  (e.g.,  least- 
squares  fit)  on  encrypted  data.  Initiate  discussions  with  products  within  IBM  to  promote 
technology  transfer.  Submit  additional  papers  for  publication  in  leading  conferences. 

Year  4:  Tune  the  research  results  to  address  the  needs  of  product  groups  within  IBM  and  to 
support  technology  transfer.  As  appropriate,  release  open  source  tools  to  enable  other  researchers 
to  build  on  our  work.  Continue  publishing  technical  papers  on  results  of  our  research. 
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4.0  Accomplishments  and  Results 


4.1  Summary  of  Major  Contributions 

IBM's  work  in  PROCEED  (in  cooperation  with  others)  significantly  advanced  the  state-of-the-art 
in  homomorphic  encryption  beyond  the  original  blueprint  of  Gentry[l],  taking  it  a  big  step 
toward  practicality.  First,  the  works  of  Gentry-Halevi[2],  and  Brakerski-Gentry- 
Vaikuntanathan[3],  allow  us  to  perform  homomorphic  computation  without  the  need  for 
squashing  or  even  bootstrapping.  Specifically  the  latter  work  provides  much  better  handle  on  the 
growth  of  the  noise  during  homomorphic  computation,  resulting  in  a  significant  speedup.  Then 
the  work  of  Gentry-Halevi-Smart[4]  provides  effective  tools  for  working  on  many  plaintext 
values  at  once,  again  resulting  in  significant  speedups.  Finally,  the  works  of  Gentry-Halevi- 
Smart[5],  Brakerski[6],  Gentry-Halevi-Smart[7],  and  Brakerski-Gentry-Halevi[8]  provide 
different  variants  and  optimizations  that  are  likely  to  have  additional  practical  advantages.  The 
IBM  FARTHER  program  administered  through  the  Navy  under  PROCEED  also  contributed  to 
these  results. 

Taken  together,  these  works  already  provide  roughly  three  orders  of  magnitude  speedup  over  the 
Gentry-Halevi[9],  results,  allowing  us  to  evaluate  homomorphically  circuits  that  were  out  of 
reach  using  only  the  Gentry  '09  blueprint.  For  example,  Gentry-Halevi-Smart[7],  demonstrated 
that  the  AES- 128  circuit  can  be  evaluated  homomorphically  in  under  36  hours.  Since  then  we 
further  optimized  our  code,  and  our  current  estimate  is  that  we  can  do  the  same  in  just  3-4  hours. 

UCSD  has  investigated  the  foundation  and  basic  building  blocks  of  lattice  cryptography,  used  in 
the  construction  of  some  fully  homomorphic  encryption  schemes  and  has  published  their  results 
in  two  papers:  "Trapdoor  for  Lattices:  Simpler,  Tighter,  Faster,  Smaller"[10]  and  "Hardness  of 
SIS  and  LWE  with  Small  Parameters"[ll]. 

The  first  paper  describes  a  new  method  for  generating  computationally  hard  lattices  together  with  a 
trapdoor  basis.  The  method  is  both  much  simpler  and  efficient,  and  produces  better  quality 
trapdoors  than  previous  methods.  The  second  paper  studies  the  parameters  for  which  the  short 
integer  solution  (SIS)  and  learning  with  errors  (LWE)  problems  are  provably  as  hard  as  worst-case 
lattice  problems.  These  are  the  two  most  fundamental  problems  used  in  all  lattice  cryptographic 
constructions,  and  using  small  parameters  has  clear  efficiency  benefits.  The  SIS  problem  is  used  in 
the  construction  of  certain  homomorphic  hash  functions,  and  it  is  shown  that  the  modulus  q  used  in 
SIS  can  be  set  almost  as  low  as  sqrtjn],  (Previous  work  required  q>n.).  The  LWE  problem  is  at  the 
basis  of  the  most  recent  fully  homomorphic  encryption  schemes,  and  the  parameter  under 
investigation  is  the  noise  distribution.  All  previous  work  requires  the  noise  to  be  at  least  sqrt{n}, 
and  to  follow  a  Gaussian  distribution.  This  can  be  undesirable,  especially  in  the  context  of  fully 
homomorphic  encryption,  because  Gaussian  distributions  are  harder  to  sample  (than  say, 
uniformly  random  strings)  and  because  errors  accumulate  during  the  execution  of  homomorphic 
operations.  So,  larger  noise  rates  results  in  reduced  homomorphic  capabilities.  Our  work  shows 
that  at  least  in  some  settings  (when  the  number  of  LWE  samples  is  sufficiently  small)  LWE  is  still 
hard  when  the  noise  is  chosen  uniformly  at  random,  and  with  much  smaller  magnitude. 

UCSD  has  also  proposed  a  novel  framework  for  the  design  and  analysis  of  secure  computation 
protocols  in  the  work  "An  equational  approach  to  secure  multi-party  computation"  [12].  The  main 
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feature  of  the  framework  is  that  it  is  fully  asynchronous:  local  computations  are  independent  of 
the  relative  ordering  of  messages  coming  from  different  communication  channels.  This  allows 
for  much  simpler  modeling  and  analysis  of  cryptographic  protocols,  which  does  not  need  a 
sequential  ordering  of  all  events.  Besides  making  the  formal  proof  of  secure  computation 
protocols  more  manageable,  the  framework  has  also  potential  efficiency  benefits:  as  messages 
can  be  transmitted  as  soon  as  they  can  be  computed  (without  compromising  the  security  of  the 
protocol),  this  may  results  in  distributed  protocols  with  lower  latency.  As  a  proof  of  concept,  the 
paper  analyzes  two  simple  protocols,  one  for  secure  broadcast,  and  one  for  verifiable  secret 
sharing,  which  demonstrate  how  the  framework  is  capable  to  deal  with  probabilistic  protocols, 
still  in  a  simple  and  equational  way. 

Stanford’s  work  has  focused  on  different  forms  of  homomorphic  encryption  and  began  with 
introducing  a  concept  called  "targeted  malleability"  which  is  designed  to  limit  the  homomorphic 
operations  that  can  be  done  on  encrypted  data[13].  The  primary  motivation  for  this  is  to  limit 
what  can  be  done  on  ciphertexts.  For  example,  a  spam  filter  operating  on  encrypted  data  should 
only  be  allowed  to  run  the  spam  predicate  and  nothing  else.  Several  constructions  for  this 
concept  have  been  provided. 

Next,  Sanford  turned  to  optimizing  fully  homomorphic  encryption.  They  first  developed  a  variant 
of  the  BGV  system  that  eliminates  the  need  for  the  expensive  modulus  switching  step[6].  This 
variant  also  enables  us  to  use  any  modulus,  including  a  power  of  2,  which  can  result  in  more 
efficient  arithmetic.  The  resulting  system  has  become  known  as  Brakerski's  FHE,  named  after 
the  post-doc  who  developed  it  as  part  of  the  PROCEED  program.  Along  the  same  lines  Stanford 
also  looked  at  a  recent  proposal  for  FHE  due  to  Bogdanov  and  Lee  which  constructs  an  efficient 
FHE  from  coding  theoretic  assumptions [14].  They  showed  that  the  Bogdanov-Lee  proposal  is 
insecure,  and  in  fact,  any  construction  using  their  approach  will  be  insecure[15]. 

Using  these  new  FHE  systems  the  team  at  Stanford  built  a  prototype  system  that  computes 
statistics  on  encrypted  data,  such  as  mean,  standard  deviation,  and  linear  regression.  The 
implementation  is  based  on  an  optimized  version  of  Brakerski's  FHE  that  takes  advantage  of  its 
arithmetic  properties.  Stanford  also  used  large-scale  batching  to  speed-up  much  of  the 
computation.  The  resulting  system  can  perform  linear  regression  on  moderate  size  encrypted  data 
sets  within  a  few  hours  on  a  single  laptop.  Parallelism  can  bring  this  down  to  a  few  minutes. 

Finally,  since  the  underlying  mechanism  behind  FHE  is  based  on  hard  problems  on  lattices, 
which  are  assumed  to  remain  secure  in  the  presence  of  quantum  computers,  Stanford  looked  at 
secure  cryptographic  primitives  in  the  age  of  quantum  computation.  In  particular,  they  built 
Message  Authentication  Codes  that  remains  secure  even  when  the  devices  using  them  are 
quantum[16].  One  of  the  team’s  instantiations  is  a  lattice-based  MAC  the  presumably  remains 
secure  in  a  post-quantum  settings. 

4.2  Technology  Transitions  and  Deliverables 

The  Stanford  team  developed  a  prototype  system  for  performing  statistical  analysis  on  encrypted 
data.  Their  work  focused  on  two  tasks:  computing  the  mean  and  variance  of  univariate  and 
multivariate  data  as  well  as  performing  linear  regression  on  a  multidimensional,  encrypted 
corpus.  Due  to  the  high  overhead  of  homomorphic  computation,  previous  implementations  of 
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similar  methods  have  been  restricted  to  small  datasets  (on  the  order  of  a  few  hundred  to  a 
thousand  elements)  or  data  with  low  dimension  (generally  1-4). 

In  this  work[17],  the  Stanford  team  first  constructed  a  working  implementation  of  the  scale- 
invariant  leveled  homomorphic  encryption  system  of  Brakerski.  Then,  by  taking  advantage  of 
batched  computation  as  well  as  a  message  encoding  technique  based  on  the  Chinese  Remainder 
Theorem,  they  showed  that  it  becomes  not  only  possible,  but  computationally  feasible,  to 
perform  statistical  analysis  on  encrypted  datasets  with  over  four  million  elements  and  dimension 
as  high  as  24.  By  using  these  methods  along  with  some  additional  optimizations,  the  team  was 
able  to  demonstrate  the  viability  of  using  leveled  homomorphic  encryption  for  large  scale 
statistical  analysis. 

The  IBM  team  designed,  implemented  and  delivered  a  Homomorphic  Encryption  (HE)  software 
library[18]  that  implements  the  Brakerski- Gentry- Vaikuntanathan  (BGV)  homomorphic 
encryption  scheme,  along  with  many  optimizations  to  make  homomorphic  evaluation  runs  faster, 
focusing  mostly  on  effective  use  of  the  Smart- Vercauteren  ciphertext  packing  techniques.  Our 
library  is  written  in  C++  and  uses  the  Number  Theory  Library  (NTL)  mathematical  library.  The 
NTL  is  a  high-performance,  portable  C++  library  providing  data  structures  and  algorithms  for 
manipulating  signed,  arbitrary  length  integers,  and  for  vectors,  matrices,  and  polynomials  over 
the  integers  and  over  finite  fields  (and  can  be  found  at  http://www.shoup.net/ntl). 

Very  roughly,  our  HE  library  consists  of  four  layers:  in  the  bottom  layer  we  have  modules  for 
implementing  mathematical  structures  and  various  other  utilities,  the  second  layer  implements 
our  Double-CRT  representation  of  polynomials,  the  third  layer  implements  the  cryptosystem 
itself  (with  the  \native"  plaintext  space  of  binary  polynomials),  and  the  top  layer  provides 
interfaces  for  using  the  cryptosystem  to  operate  on  arrays  of  plaintext  values.  We  think  of  the 
bottom  two  layers  as  the  \math  layers",  and  the  top  two  layers  as  the  \crypto  layers",  and  describe 
then  in  detail  in  our  work[18].  A  block-diagram  description  of  the  library  is  given  in  Figure  1. 

At  the  top  level  of  the  library  we  provide  some  interfaces  that  allow  the  application  to  manipulate 
arrays  of  plaintext  values  homomorphically.  The  arrays  are  translated  to  plaintext  polynomials 
using  the  encoding/decoding  routines  and  then  encrypted  and  manipulated  homomorphically 
using  the  lower-level  interfaces  from  the  crypto  layer. 

The  basic  operations  that  we  have  in  the  HE  library  scheme  are  the  usual  key-generation, 
encryption,  and  decryption,  the  homomorphic  evaluation  routines  for  addition,  multiplication  and 
automorphism  (and  also  addition-of-constant  and  multiplication-by-constant),  and  the  ciphertext 
maintenance  operations  of  key-switching  and  modulus-switching. 

In  addition  to  the  software  described  above  the  PROCEED  AHEAD  team  has  delivered  many 
significant  publications  which  are  summarized  in  the  Publications  section  of  this  report. 
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Figure  1:  A  block  diagram  of  the  Homomorphic-Encryption  library 

4.3  Publications 

Full  versions  of  PROCEED  AHEAD  papers  are  provided  as  an  attachment  to  this  report  as  noted 
in  the  appendix  to  this  report. 

1.  “Fully  Homomorphic  Encryption  without  Squashing  Using  Depth-3  Arithmetic  Circuits” 

by  Craig  Gentry  and  Shai  Halevi[2] 

We  describe  a  new  approach  for  constructing  fully  homomorphic  encryption  (FHE)  schemes. 
Previous  FHE  schemes  all  use  the  same  blueprint  from  Gentry[l].  First  construct  a  somewhat 
homomorphic  encryption  (SWHE)  scheme,  next  "squash"  the  decryption  circuit  until  it  is  simple 
enough  to  be  handled  within  the  homomorphic  capacity  of  the  SWHE  scheme,  and  finally 
"bootstrap"  to  get  a  FHE  scheme.  In  all  existing  schemes,  the  squashing  technique  induces  an 
additional  assumption:  that  the  sparse  subset  sum  problem  (SSSP)  is  hard. 

Our  new  approach  constructs  FHE  as  a  hybrid  of  a  SWHE  and  a  multiplicatively  homomorphic 
encryption  (MHE)  scheme,  such  as  Elgamal.  Our  construction  eliminates  the  need  for  the 
squashing  step,  and  thereby  also  removes  the  need  to  assume  the  SSSP  is  hard.  We  describe  a 
few  concrete  instantiations  of  the  new  method,  including  a  "simple"  FHE  scheme  where  we 
replace  SSSP  with  Decision  Diffie-Hellman,  an  optimization  of  the  simple  scheme  that  let  us 
"compress"  the  FHE  ciphertext  into  a  single  Elgamal  ciphertext(l),  and  a  scheme  whose  security 
can  be  (quantumly)  reduced  to  the  approximate  ideal-SIVP. 
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We  stress  that  the  new  approach  still  relies  on  bootstrapping,  but  it  shows  how  to  bootstrap 
without  having  to  "squash"  the  decryption  circuit.  The  main  technique  is  to  express  the 
decryption  function  of  SWHE  schemes  as  a  depth-3  (£  fl  X)  arithmetic  circuit  of  a  particular 
form.  When  evaluating  this  circuit  homomorphically  (as  needed  for  bootstrapping),  we 
temporarily  switch  to  a  MHE  scheme,  such  as  Elgamal,  to  handle  thf]  part.  Due  to  the  special 
form  of  the  circuit,  the  switch  to  the  MHE  scheme  can  be  done  without  having  to  evaluate 
anything  homomorphically.  We  then  translate  the  result  back  to  the  SWHE  scheme  by 
homomorphically  evaluating  the  decryption  function  of  the  MHE  scheme.  Using  our  method,  the 
SWHE  scheme  only  needs  to  be  capable  of  evaluating  the  MHE  scheme's  decryption  function, 
not  its  own  decryption  function.  We  thereby  avoid  the  circularity  that  necessitated  squashing  in 
the  original  blueprint. 

2.  “Fully  Homomorphic  Encryption  without  Bootstrapping”  by  Zvika  Brakerski,  Craig 
Gentry,  and  Vinod  Vaikuntanathan[3] 

We  present  a  radically  new  approach  to  fully  homomorphic  encryption  (FHE)  that  dramatically 
improves  performance  and  bases  security  on  weaker  assumptions.  A  central  conceptual 
contribution  in  our  work  is  a  new  way  of  constructing  leveled  fully  homomorphic  encryption 
schemes  (capable  of  evaluating  arbitrary  polynomial-size  circuits),  without  Gentry’s 
bootstrapping  procedure. 

Specifically,  we  offer  a  choice  of  FHE  schemes  based  on  the  learning  with  error  (LWE)  or  ring- 
LWE  (RLWE)  problems  that  have  2c  security  against  known  attacks.  For  RLWE,  we  have: 

■  A  leveled  FHE  scheme  that  can  evaluate  L-level  arithmetic  circuits  with  O  (y  ■  L3)  per-gate 
computation-  i.e.,  computation  quasi-linear  in  the  security  parameter.  Security  is  based  on 
RLWE  for  an  approximation  factor  exponential  in  L.  This  construction  does  not  use  the 
bootstrapping  procedure. 

■  A  leveled  FHE  scheme  that  uses  bootstrapping  as  an  optimization,  where  the  per-gate 
computation  (which  includes  the  bootstrapping  procedure)  is  O  (y2),  independent  of  L. 
Security  is  based  on  the  hardness  of  RLWE  for  quasi-polynomial  factors  (as  opposed  to  the 
sub-exponential  factors  needed  in  previous  schemes). 

We  obtain  similar  results  for  LWE,  but  with  worse  performance.  We  introduce  a  number  of 
further  optimizations  to  our  schemes.  As  an  example,  for  circuits  of  large  width  -  e.g.,  where  a 
constant  fraction  of  levels  have  width  at  least  y-  we  can  reduce  the  per-gate  computation  of  the 
bootstrapped  version  to  O  (y),  independent  of  L,  by  batching  the  bootstrapping  operation. 
Previous  FHE  schemes  all  required  D(yA3.5)  computation  per  gate. 

At  the  core  of  our  construction  is  a  much  more  effective  approach  for  managing  the  noise  level 
of  lattice-based  ciphertexts  as  homomorphic  operations  are  performed,  using  some  new 
techniques  recently  introduced  by  Brakerski  and  Vaikuntanathan[19]. 

3.  “Targeted  Malleability:  Homomorphic  Encryption  for  Restricted  Computations”  by 

Dan  Boneh,  Gil  Segev  and  Brent  Waters [13] 

We  put  forward  the  notion  of  targeted  malleability:  given  a  homomorphic  encryption  scheme,  in 
various  scenarios  we  would  like  to  restrict  the  homomorphic  computations  one  can  perform  on 
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encrypted  data.  We  introduce  a  precise  framework,  generalizing  the  foundational  notion  of  non¬ 
malleability  introduced  by  Dolev,  Dwork,  and  Naor[20],  ensuring  that  the  malleability  of  a 
scheme  is  targeted  only  at  a  specific  set  of  "allowable"  functions. 

In  this  setting  we  are  mainly  interested  in  the  efficiency  of  such  schemes  as  a  function  of  the 
number  of  repeated  homomorphic  operations.  Whereas  constructing  a  scheme  whose  ciphertext 
grows  linearly  with  the  number  of  such  operations  is  straightforward,  obtaining  more  realistic  (or 
merely  non-trivial)  length  guarantees  is  significantly  more  challenging. 

We  present  two  constructions  that  transform  any  homomorphic  encryption  scheme  into  one  that 
offers  targeted  malleability.  Our  constructions  rely  on  standard  cryptographic  tools  and  on 
succinct  non-interactive  arguments,  which  are  currently  known  to  exist  in  the  standard  model 
based  on  variants  of  the  knowledge-of-exponent  assumption.  The  two  constructions  offer 
somewhat  different  efficiency  guarantees,  each  of  which  may  be  preferable  depending  on  the 
underlying  building  blocks. 

4.  “Trapdoors  for  Lattices:  Simpler,  Tighter,  Faster,  Smaller”  by  Daniele  Micciancio  and 
Chris  Peikert[10] 

We  give  new  methods  for  generating  and  using  “strong  trapdoors”  in  cryptographic  lattices, 
which  are  simultaneously  simple,  efficient,  easy  to  implement  (even  in  parallel),  and 
asymptotically  optimal  with  very  small  hidden  constants.  Our  methods  involve  a  new  kind  of 
trapdoor,  and  include  specialized  algorithms  for  inverting  LWE,  randomly  sampling  SIS 
preimages,  and  securely  delegating  trapdoors.  These  tasks  were  previously  the  main  bottleneck 
for  a  wide  range  of  cryptographic  schemes,  and  our  techniques  substantially  improve  upon  the 
prior  ones,  both  in  terms  of  practical  performance  and  quality  of  the  produced  outputs.  Moreover, 
the  simple  structure  of  the  new  trapdoor  and  associated  algorithms  can  be  exposed  in 
applications,  leading  to  further  simplifications  and  efficiency  improvements.  We  exemplify  the 
applicability  of  our  methods  with  new  digital  signature  schemes  and  CCA-secure  encryption 
schemes,  which  have  better  efficiency  and  security  than  the  previously  known  lattice-based 
constructions. 

5.  “Homomorphic  Evaluation  of  the  AES  Circuit”  by  Craig  Gentry,  Shai  Halevi  and  Nigel 
Smart[7] 

We  describe  a  working  implementation  of  leveled  homomorphic  encryption  (without 
bootstrapping)  that  can  evaluate  the  AES-128  circuit  in  three  different  ways.  One  variant  takes 
under  over  36  hours  to  evaluate  an  entire  AES  encryption  operation,  using  NTL  (over  GMP)  as 
our  underlying  software  platform,  and  running  on  a  large-memory  machine.  Using  SIMD 
techniques,  we  can  process  over  54  blocks  in  each  evaluation,  yielding  an  amortized  rate  of  just 
under  40  minutes  per  block.  Another  implementation  takes  just  over  two  and  a  half  days  to 
evaluate  the  AES  operation,  but  can  process  720  blocks  in  each  evaluation,  yielding  an  amortized 
rate  of  just  over  five  minutes  per  block.  We  also  detail  a  third  implementation,  which 
theoretically  could  yield  even  better  amortized  complexity,  but  in  practice  turns  out  to  be  less 
competitive. 
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For  our  implementations  we  develop  both  AES-specific  optimizations  as  well  as  several  “generic” 
tools  for  FHE  evaluation.  These  last  tools  include  (among  others)  a  different  variant  of  the 
Brakerski-Vaikuntanathan  key- switching  technique  that  does  not  require  reducing  the  norm  of 
the  ciphertext  vector,  and  a  method  of  implementing  the  Brakerski- Gentry- Vaikuntanathan 
modulus- switching  transformation  on  ciphertexts  in  CRT  representation. 

6.  “Fully  Homomorphic  Encryption  without  Modulus  Switching  from  Classical  GapSVP” 

by  Zvika  Brakerski[6] 

We  present  a  new  tensoring  technique  for  LWE-based  fully  homomorphic  encryption.  While  in 
all  previous  works,  the  ciphertext  noise  grows  quadratically  (B->  B2  •  poly(n))  with  every 
multiplication  (before  “refreshing”),  our  noise  only  grows  linearly  (B  — »  B  •  poly(n)). 

We  use  this  technique  to  construct  a  scale-invariant  fully  homomorphic  encryption  scheme, 
whose  properties  only  depend  on  the  ratio  between  the  modulus  q  and  the  initial  noise  level  B, 
and  not  on  their  absolute  values. 

Our  scheme  has  a  number  of  advantages  over  previous  candidates:  It  uses  the  same  modulus 
throughout  the  evaluation  process  (no  need  for  “modulus  switching”),  and  this  modulus  can  take 
arbitrary  form.  In  addition,  security  can  be  classically  reduced  from  the  worst-case  hardness  of 
the  GapSVP  problem  (with  quasi-polynomial  approximation  factor),  whereas  previous 
constructions  could  only  exhibit  a  quantum  reduction  from  GapSVP. 

7.  “When  Homomorphism  Becomes  a  Liability”  by  Zvika  Brakerski[15] 

We  show  that  an  encryption  scheme  cannot  have  a  simple  decryption  function  and  be 
homomorphic  at  the  same  time,  even  with  added  noise.  Specifically,  if  a  scheme  can 
homomorphically  evaluate  the  majority  function,  then  its  decryption  cannot  be  weakly-learnable 
(in  particular,  linear),  even  if  large  decryption  error  is  allowed.  (In  contrast,  without 
homomorphism,  such  schemes  do  exist  and  are  presumed  secure,  e.g.,  based  on  LPN.) 

An  immediate  corollary  is  that  known  schemes  that  are  based  on  the  hardness  of  decoding  in  the 
presence  of  low  hamming- weight  noise  cannot  be  fully  homomorphic.  This  applies  to  known 
schemes  such  as  LPN-based  symmetric  or  public  key  encryption. 

Using  these  techniques,  we  show  that  the  recent  candidate  fully  homomorphic  encryption, 
suggested  by  Bogdanov  and  Lee  (BL)[14],  is  insecure.  In  fact,  we  show  two  attacks  on  the  BL 
scheme:  One  that  uses  homomorphism,  and  another  that  directly  attacks  a  component  of  the 
scheme. 

8.  “Quantum-secure  Message  Authentication  Codes”  by  Dan  Boneh  and  Mark  Zhandry[16] 

We  construct  the  first  Message  Authentication  Codes  (MACs)  that  are  existentially  unforgeable 
against  a  quantum  chosen  message  attack.  These  chosen  message  attacks  model  a  quantum 
adversary’s  ability  to  obtain  the  MAC  on  a  superposition  of  messages  of  its  choice.  We  begin  by 
showing  that  a  quantum  secure  PRF  is  sufficient  for  constructing  a  quantum  secure  MAC,  a  fact 
that  is  considerably  harder  to  prove  than  its  classical  analogue.  Next,  we  show  that  a  variant  of 
Carter-Wegman  MACs  can  be  proven  to  be  quantum  secure.  Unlike  the  classical  settings,  we 
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present  an  attack  showing  that  a  pair-wise  independent  hash  family  is  insufficient  to  construct  a 
quantum  secure  one-time  MAC,  but  we  prove  that  a  four-wise  independent  family  is  sufficient 
for  one-time  security. 

9.  “Dynamic  Proofs  of  Retrievability  via  Oblivious  RAM”  by  David  Cash  and  Alptekin 
Kupcu  and  Daniel  Wichs[21] 

Proofs  of  retrievability  allow  a  client  to  store  her  data  on  a  remote  server  ("in  the  cloud")  and 
periodically  execute  an  efficient  audit  protocol  to  check  that  all  of  the  data  is  being  maintained 
correctly  and  can  be  recovered  from  the  server.  For  efficiency,  the  computation  and 
communication  of  the  server  and  client  during  an  audit  protocol  should  be  significantly  smaller 
than  reading/transmitting  the  data  in  its  entirety.  Although  the  server  is  only  asked  to  access  a 
few  locations  of  its  storage  during  an  audit,  it  must  maintain  full  knowledge  of  all  client  data  to 
be  able  to  pass. 

Starting  with  the  work  of  Juels  and  Kaliski[22],  all  prior  solutions  to  this  problem  crucially 
assume  that  the  client  data  is  static  and  do  not  allow  it  to  be  efficiently  updated.  Indeed,  they  all 
store  a  redundant  encoding  of  the  data  on  the  server,  so  that  the  server  must  delete  a  large 
fraction  of  its  storage  to  'lose'  any  actual  content.  Unfortunately,  this  means  that  even  a  single  bit 
modification  to  the  original  data  will  need  to  modify  a  large  fraction  of  the  server  storage,  which 
makes  updates  highly  inefficient.  Overcoming  this  limitation  was  left  as  the  main  open  problem 
by  all  prior  works. 

In  this  work  we  give  the  first  solution  providing  proofs  of  retrievability  for  dynamic  storage, 
where  the  client  can  perform  arbitrary  reads/writes  on  any  location  within  her  data  by  running  an 
efficient  protocol  with  the  server.  At  any  point  in  time,  the  client  can  execute  an  efficient  audit 
protocol  to  ensure  that  the  server  maintains  the  latest  version  of  the  client  data.  The  computation 
and  communication  complexity  of  the  server  and  client  in  our  protocols  is  only  polylogarithmic 
in  the  size  of  the  client's  data.  The  starting  point  of  our  solution  is  to  split  up  the  data  into  small 
blocks  and  redundantly  encode  each  block  of  data  individually,  so  that  an  update  inside  any  data 
block  only  affects  a  few  codeword  symbols.  The  main  difficulty  is  to  prevent  the  server  from 
identifying  and  deleting  too  many  codeword  symbols  belonging  to  any  single  data  block.  We  do 
so  by  hiding  where  the  various  codeword  symbols  for  any  individual  data  lock  are  stored  on  the 
server  and  when  they  are  being  accessed  by  the  client,  using  the  algorithmic  techniques  of 
oblivious  RAM. 

10.  “Hardness  of  SIS  and  LWE  with  Small  Parameters”  by  Daniele  Micciancio  and  Chris 
Peikert[ll] 

The  Short  Integer  Solution  (SIS)  and  Learning  With  Errors  (LWE)  problems  are  the  foundations 
for  countless  applications  in  lattice-based  cryptography,  and  are  provably  as  hard  as  approximate 
lattice  problems  in  the  worst  case.  An  important  question  from  both  a  practical  and  theoretical 
perspective  is  how  small  their  parameters  can  be  made,  while  preserving  their  worst-case 
hardness.  We  prove  two  main  results  on  SIS  and  LWE  with  small  parameters.  For  SIS,  we  show 
that  the  problem  retains  worst-case  hardness  for  moduli  q  >=  beta*nAdelta  for  any  constant  delta 
>  0,  where  beta  is  the  bound  on  the  Euclidean  norm  of  the  solution.  This  improves  upon  prior 
results  which  required  q>=beta*sqrt{n  log  n},  and  is  essentially  optimal  since  the  problem  is 
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trivially  easy  for  q<=beta.  For  LWE,  we  show  that  it  remains  hard  even  when  the  errors  are  small 
(e.g.,  uniformly  random  from  {0,  1 }),  provided  that  the  number  of  samples  is  small  enough  (e.g., 
linear  in  the  dimension  n  of  the  LWE  secret).  Prior  results  required  the  errors  to  have  magnitude 
at  least  sqrt{n}  and  to  come  from  a  Gaussian-like  distribution. 

11.  “How  to  Delegate  Secure  Multiparty  Computation  to  the  Cloud”  by  Nishanth  Chandran, 
Rosario  Gennaro,  Abhishek  Jain,  Amit  Sahai.[23] 

We  initiate  the  study  of  verifiable  computation  in  the  presence  of  many  clients  who  rely  on  a 
server  to  perform  computations  over  inputs  privately  held  by  each  client.  This  generalizes  the 
single-client  model  for  verifiable  outsourced  computation  previously  studied  in  the  literature.  We 
put  forward  a  computational  model  and  security  definitions  for  this  task.  We  then  present  a  new 
protocol  that  allows  the  clients  to  securely  outsource  an  arbitrary  computation  over  privately  held 
inputs  to  a  powerful  server.  At  the  end  the  clients  will  be  assured  that  the  result  of  the 
computation  is  correct,  while  at  the  same  time  protecting  their  data  from  the  server  and  each 
other.  Our  new  protocol  satisfies  the  crucial  efficiency  requirement  of  outsourced  computation 
where  the  work  of  the  client  is  substantially  smaller  than  what  is  required  to  compute  the 
function.  We  use  the  Gennaro  et  al.  amortized  model,  whereas  the  clients  are  allowed  to  invest 
into  a  one-time  computationally  expensive  preprocessing  phase.  Additionally  our  protocol 
minimizes  the  interaction  between  the  clients,  by  requiring  only  one  round  of  interaction  between 
them  for  each  computation  outsourced  to  the  server.  Such  single  round  of  interaction  is  necessary 
if  input  privacy  is  to  be  preserved. 

12.  “An  Equational  Approach  to  Secure  Multi-Party  Computation”  by  Daniele  Micciancio 
and  Stefano  Tessaro[12] 

We  present  a  novel  framework  for  the  description  and  analysis  of  secure  computation  protocols 
that  is  at  the  same  time  mathematically  rigorous  and  notationally  lightweight  and  concise.  The 
distinguishing  feature  of  the  framework  is  that  it  allows  to  specify  (and  analyze)  protocols  in  a 
manner  that  is  largely  independent  of  time,  greatly  simplifying  the  study  of  cryptographic 
protocols.  At  the  notational  level,  protocols  are  described  by  systems  of  mathematical  equations 
(over  domains),  and  can  be  studied  through  simple  algebraic  manipulations  like  substitutions  and 
variable  elimination.  We  exemplify  our  framework  by  analyzing  in  detail  two  classic  protocols:  a 
protocol  for  secure  broadcast,  and  a  verifiable  secret  sharing  protocol,  the  second  of  which 
illustrates  the  ability  of  our  framework  to  deal  with  probabilistic  systems,  still  in  a  purely 
equational  way. 

13.  “Semantic  Security  for  the  Wiretap  Channel”  by  Mihir  Bellare,  Stephano  Tessaro,  and 
Alexander  Vardy  [24] 

The  wiretap  channel  is  a  setting  where  one  aims  to  provide  information-theoretic  privacy  of 
communicated  data  based  solely  on  the  assumption  that  the  channel  from  sender  to  adversary  is 
“noisier”  than  the  channel  from  sender  to  receiver.  It  has  developed  in  the  Information  and 
Coding  (I&C)  community  over  the  last  30  years  largely  divorced  from  the  parallel  development 
of  modern  cryptography.  This  paper  aims  to  bridge  the  gap  with  a  cryptographic  treatment 
involving  advances  on  two  fronts,  namely  definitions  and  schemes.  On  the  first  front 
(definitions),  we  explain  that  the  mis-r  definition  in  current  use  is  weak  and  propose  two 


Approved  for  Public  Release;  Distribution  Unlimited. 

14 


alternatives:  mis  (based  on  mutual  information)  and  ss  (based  on  the  classical  notion  of  semantic 
security).  We  prove  them  equivalent,  thereby  connecting  two  fundamentally  different  ways  of 
defining  privacy  and  providing  a  new,  strong  and  well-founded  target  for  constructions.  On  the 
second  front  (schemes),  we  provide  the  first  explicit  scheme  with  all  the  following 
characteristics:  it  is  proven  to  achieve  both  security  (ss  and  mis,  not  just  mis-r)  and  decodability; 
it  has  optimal  rate;  and  both  the  encryption  and  decryption  algorithms  are  proven  to  be 
polynomialtime. 

14.  “Multi-Instance  Security  and  its  Application  to  Password-Based  Cryptography,”  by 

Mihir  Bellare,  Thomas  Ristenpart,  and  Stephano  Tessaro[25] 

This  paper  develops  a  theory  of  multi-instance  (mi)  security  and  applies  it  to  provide  the  first 
proof-based  support  for  the  classical  practice  of  salting  in  password-based  cryptography.  Mi- 
security  comes  into  play  in  settings  (like  password-based  cryptography)  where  it  is 
computationally  feasible  to  compromise  a  single  instance,  and  provides  a  second  line  of  defense, 
aiming  to  ensure  (in  the  case  of  passwords,  via  salting)  that  the  effort  to  compromise  all  of  some 
large  number  m  of  instances  grows  linearly  with  m.  The  first  challenge  is  definitions,  where  we 
suggest  LORX-security  as  a  good  metric  for  mi  security  of  encryption  and  support  this  claim  by 
showing  it  implies  other  natural  metrics,  illustrating  in  the  process  that  even  lifting  simple  results 
from  the  si  setting  to  the  mi  one  calls  for  new  techniques.  Next  we  provide  a  composition-based 
framework  to  transfer  standard  single-instance  (si)  security  to  mi-security  with  the  aid  of  a  key- 
derivation  function.  Analyzing  password-based  KDFs  from  the  PKCS#5  standard  to  show  that 
they  meet  our  indifferentiability- style  mi-security  definition  for  KDFs,  we  are  able  to  conclude 
with  the  first  proof  that  per  password  salts  amplify  mi-security  as  hoped  in  practice.  We  believe 
that  mi-security  is  of  interest  in  other  domains  and  that  this  work  provides  the  foundation  for  its 
further  theoretical  development  and  practical  application. 

15.  “To  Hash  or  Not  to  Hash  Again?  (In)differentiability  Results  for  H2  and  HMAC,”  by 

Yevgeniy  Dodis,  Thomas  Ristenpart,  John  Steinberger,  and  Stephano  Tessaro[26] 

We  show  that  the  second  iterate  H"(M)  =  H(H(M))  of  a  random  oracle  H  cannot  achieve  strong 
security  in  the  sense  of  indifferentiability  from  a  random  oracle.  We  do  so  by  proving  that 
indifferentiability  for  H  holds  only  with  poor  concrete  security  by  providing  a  lower  bound  (via 
an  attack)  and  a  matching  upper  bound  (via  a  proof  requiring  new  techniques)  on  the  complexity 
of  any  successful  simulator.  We  then  investigate  HMAC  when  it  is  used  as  a  general-purpose 
hash  function  with  arbitrary  keys  (and  not  as  a  MAC  or  PRF  with  uniform,  secret  keys).  We 
uncover  that  HMAC’s  handling  of  keys  gives  rise  to  two  types  of  weak  key  pairs.  The  first 
allows  trivial  attacks  against  its  indifferentiability;  the  second  gives  rise  to  structural  issues 
similar  to  that  which  ruled  out  strong  indifferentiability  bounds  in  the  case  of  H  .  However,  such 
weak  key  pairs  do  not  arise,  as  far  as  we  know,  in  any  deployed  applications  of  HMAC.  For 
example,  using  keys  of  any  fixed  length  shorter  than  cl  -  1,  where  cl  is  the  block  length  in  bits  of 
the  underlying  hash  function,  completely  avoids  weak  key  pairs.  We  therefore  conclude  with  a 
positive  result:  a  proof  that  HMAC  is  indifferentiable  from  a  RO  (with  standard,  good  bounds) 
when  applications  use  keys  of  a  fixed  length  less  than  d  -  1. 


16.  “Design  and  Implementation  of  a  Homomorphic-Encryption  Library”,  by  Shai  Halevi  and 

Approved  for  Public  Release;  Distribution  Unlimited. 

15 


Victor  Shoup. 


We  describe  the  design  and  implementation  of  a  softwwre  library  that  implements  the  Brakerski- 
Gentry-Vaikuntanathan  (BGV)  homomorphic  encryption  scheme,  along  with  many  optimizations 
to  make  homomorphic  evaluation  run  faster,  focusing  mostly  on  effective  use  of  the  Smart- 
Vercauteren  ciphertext  packing  techniques.  Our  library  is  written  in  C++  and  uses  the  NTL 
mathematical  library. 

17.  “Using  Homomorphic  Encryption  for  large  Scale  Statistical  Analysis”,  by  David  Wu, 
Jacob  Haven  and  Dan  Boneh. 

We  describe  in  a  Viewgraph  format,  the  scale -invariant  leveled  fully  homomorphic  encryption 
scheme.  With  this  scheme  we  are  able  to  use  batching  and  CRT-based  message  encoding  to 
perform  large  scale  statistical  analysis  on  millions  of  data  points  and  data  of  moderate  dimension. 
The  chart  progresses  through;  the  motivation,  the  approach,  the  theory  of  computation  on  large 
integers,  the  Client  side  and  Server  sides  of  the  Homomorphic  Encryption  Schemes,  Experimental 
Timing  Results  and  Conclusions. 
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5.0  Conclusions  and  Recommendations 


The  PROCEED  AHEAD  team  has  delivered  outstanding  results  in  just  the  first  half  (two  years) 
of  the  intended  four  year  project.  Our  results  spanned  from  both  fundamental  theoretical 
contributions  to  the  development  of  Fully  Homomorphic  Encryption  (FHE),  special  forms  of 
FHE,  better  lattice  design,  through  applications  such  as  delegation  of  computations  and 
understanding  of  the  underlyings  of  multiparty  computations,  to  implementations  and 
optimizations  of  FHE. 

There  lies  significant  opportunity  to  build  upon  our  work  from  the  first  two  years  of  the 
PROCEED  AHEAD  project  including  additional  aspects  of  computation  delegation.  There  is 
also  an  opening  to  apply  our  prototype  system  for  performing  statistical  analysis  on  large  scale 
data  computing  the  mean  and  covariance  of  multivariate  data  and  performing  linear  regression 
over  encrypted  datasets.  Also,  the  Homomorphic  Encryption  (HE)  software  library[18]  requires 
the  cooperation  from  PROCEED  program  partners  and  the  cryptographic  research  community  at 
large  to  advance  it  from  its  current  "proof  of  concept"  state  into  a  fully  homomorphic  encryption 
scheme  applicable  to  real  world  tasks  such  as  network  guards  and  other  computations  on 
encrypted  data  such  as  curve  fitting  (e.g.,  least-squares  fit). 

Even  with  the  major  advances  in  the  state  of  HE  over  the  last  few  years,  both  the  size  of  HE 
ciphertext  and  the  complexity  of  computing  them  remain  quite  high.  An  obvious  direction  for 
future  work  is  to  find  additional  optimizations  to  reduce  this  overhead  further.  One  direction 
which  was  not  explored  but  seems  to  have  large  potential,  is  finding  a  cheaper  method  to  replace 
Gentry's  bootstrapping  technique.  Namely,  it  is  still  plausible  that  we  can  reduce  the  noise  in  the 
ciphertext  by  applying  a  cheaper  transformation  than  full  homomorphic  decryption. 

The  team  has  worked  intensely  on  the  AHEAD  project,  collaborating  with  other  participants 
within  the  PROCEED  program,  and  has  delivered  results  in  a  pace  that  even  surprised  us!  IBM 
has  enjoyed  working  on  the  project  and  will  take  leave  (resulting  from  limitation  of  funds)  while 
Stanford  and  UCSD  continue  to  work  within  the  PROCEED  program. 
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Abstract 

We  describe  a  new  approach  for  constructing  fully  homomorphic  encryption  (FHE)  schemes. 
Previous  FHE  schemes  all  use  the  same  blueprint  from  [Gentry  2009]:  First  construct  a  some¬ 
what  homomorphic  encryption  (SWHE)  scheme,  next  “squash”  the  decryption  circuit  until  it  is 
simple  enough  to  be  handled  within  the  homomorphic  capacity  of  the  SWHE  scheme,  and  finally 
“bootstrap”  to  get  a  FHE  scheme.  In  all  existing  schemes,  the  squashing  technique  induces  an 
additional  assumption:  that  the  sparse  subset  sum  problem  (SSSP)  is  hard. 

Our  new  approach  constructs  FHE  as  a  hybrid  of  a  SWHE  and  a  multiplicatively  homomor¬ 
phic  encryption  (MHE)  scheme,  such  as  Elgamal.  Our  construction  eliminates  the  need  for  the 
squashing  step,  and  thereby  also  removes  the  need  to  assume  the  SSSP  is  hard.  We  describe 
a  few  concrete  instantiations  of  the  new  method,  including  a  “simple”  FHE  scheme  where  we 
replace  SSSP  with  Decision  Diffie-Hellman,  an  optimization  of  the  simple  scheme  that  let  us 
“compress”  the  FHE  ciphertext  into  a  single  Elgamal  ciphertext (!),  and  a  scheme  whose  security 
can  be  (quantumly)  reduced  to  the  approximate  ideal-SIVP. 

We  stress  that  the  new  approach  still  relies  on  bootstrapping,  but  it  shows  how  to  bootstrap 
without  having  to  “squash”  the  decryption  circuit.  The  main  technique  is  to  express  the  decryp¬ 
tion  function  of  SWHE  schemes  as  a  depth-3  ( [Q  E)  arithmetic  circuit  of  a  particular  form. 

When  evaluating  this  circuit  homomorphically  (as  needed  for  bootstrapping),  we  temporarily 
switch  to  a  MHE  scheme,  such  as  Elgamal,  to  handle  the  R[  part.  Due  to  the  special  form  of 
the  circuit,  the  switch  to  the  MHE  scheme  can  be  done  without  having  to  evaluate  anything 
homomorphically.  We  then  translate  the  result  back  to  the  SWHE  scheme  by  homomorphically 
evaluating  the  decryption  function  of  the  MHE  scheme.  Using  our  method,  the  SWHE  scheme 
only  needs  to  be  capable  of  evaluating  the  MHE  scheme’s  decryption  function,  not  its  own  de¬ 
cryption  function.  We  thereby  avoid  the  circularity  that  necessitated  squashing  in  the  original 
blueprint. 

Key  words.  Arithmetic  Circuits,  Depth-3  Circuits,  Homomorphic  Encryption,  Symmetric  Poly¬ 
nomials 
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1  Introduction 


Fully  homomorphic  encryption  allows  anyone  to  perform  arbitrarily  computations  on  encrypted 
data,  despite  not  having  the  secret  decryption  key.  Several  fully  homomorphic  encryption  (FHE) 
schemes  appeared  recently  [Gen09b,  vDGHVIO,  SV10,  GH11],  all  following  the  same  blueprint  as 
Gentry’s  original  construction  [Gen09b,  Gen09a]: 

1.  SWHE.  Construct  a  somewhat  homomorphic  encryption  (SWHE)  scheme  -  roughly,  a  scheme 
that  can  evaluate  low-degree  polynomials  homomorphically. 

2.  Squash.  “Squash”  the  decryption  function  of  the  SWHE  scheme,  until  decryption  can  be 
expressed  as  polynomial  of  degree  low  enough  to  be  handled  within  the  homomorphic  capacity  of 
the  SWHE  scheme,  with  enough  capacity  left  over  to  evaluate  a  NAND  gate.  This  is  done  by 
adding  a  “hint”  to  the  public  key  -  namely,  a  large  set  of  elements  that  has  a  secret  sparse  subset 
that  sums  to  the  original  secret  key. 

3.  Bootstrap.  Given  a  SWHE  scheme  that  can  evaluate  its  decryption  function  (plus  a  NAND), 
apply  Gentry’s  transformation  to  get  a  “leveled” 1  FHE  scheme. 

In  this  work  we  construct  leveled  FHE  by  combining  a  SWHE  scheme  with  a  “compatible” 
multiplicatively  homomorphic  encryption  (MHE)  scheme  (such  as  Elgamal)  in  a  surprising  way. 
Our  construction  still  relies  on  bootstrapping,  but  it  does  not  use  squashing  and  does  not  rely 
on  the  assumed  hardness  of  the  sparse  subset  sum  problem  (SSSP).  Using  the  new  method,  we 
construct  a  “simple”  leveled  FHE  scheme  where  SSSP  is  replaced  with  Decision  Diffie-Hellman.  We 
also  describe  an  optimization  of  this  simple  scheme  where  at  one  point  during  the  bootstrapping 
process,  the  entire  leveled  FHE  ciphertext  consists  of  a  single  MHE  (e.g.,  Elgamal)  ciphertext! 
Finally,  we  show  that  it  is  possible  to  replace  the  MHE  scheme  by  an  additively  homomorphic 
encryption  (AHE)  scheme  that  encrypts  discrete  logarithms.  This  allows  us  to  construct  a  leveled 
FHE  scheme  whose  security  is  based  entirely  on  the  worst-case  hardness  of  the  shortest  independent 
vector  problem  over  ideal  lattices  (ideal-SIVP)  (compare  [GenlO]).  As  in  Gentry’s  original  blueprint, 
we  obtain  a  pure  FHE  scheme  by  assuming  circular  security.  At  present,  our  new  approach  does 
not  improve  efficiency,  aside  from  the  optimization  that  reduces  the  ciphertext  length. 

1.1  Our  Main  Technical  Innovation 

Our  main  technical  innovation  is  a  new  way  to  evaluate  homomorphically  the  decryption  circuits 
of  the  underlying  SWHE  schemes.  Decryption  in  these  schemes  involves  computing  a  threshold 
function,  that  can  be  expressed  as  a  multilinear  symmetric  polynomial.  Previous  works  [Gen09b, 
vDGHVIO,  SV10,  GH11]  evaluated  those  polynomials  in  the  “obvious  way”  using  boolean  circuits. 
Instead,  here  we  use  Ben-Or’s  observation  (reported  in  [NW97])  that  multilinear  symmetric  poly¬ 
nomials  can  be  computed  by  depth-3  (]T)  FI  S)  arithmetic  circuits  over  Zp  for  large  enough  prime 
p.  Let  efc(-)  be  the  n-variable  degree-A:  elementary  symmetric  polynomial,  and  consider  a  vector 
x  =  (aq, . . .  ,xn)  €  {0,l}n.  The  value  of  ek(x)  is  simply  the  coefficient  of  zn~k  in  the  univari¬ 
ate  polynomial  P(z)  =  YYi=\(z  +  xi)-  This  coefficient  can  be  computed  by  fixing  an  arbitrary 
set  A  =  {ai, . . . ,  an+i}  C  Zp,  then  evaluating  the  polynomial  P(z)  at  the  points  in  A  to  obtain 

1In  a  “leveled”  FHE  scheme,  the  size  of  the  public  key  is  linear  in  the  depth  of  the  circuits  to  evaluate.  A  “pure” 
FHE  scheme  (with  a  fixed-sized  public  key)  can  be  obtained  by  assuming  “circular  security”  -  namely,  that  it  is  safe 
to  encrypt  the  leveled  FHE  secret  key  under  its  own  public  key. 
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tj  =  P(aj),  and  finally  interpolating  the  coefficient  of  interest  as  a  linear  combination  of  the  tj’ s. 
The  resulting  circuit  has  the  form 


n+1  n 

e-k{x)  =  ^2  -V  +  Xi^  (mod  +  (!) 

j= i  *=f 

where  A+s  are  the  interpolation  coefficients,  which  are  some  known  constants  in  Zp.  Any  multi¬ 
linear  symmetric  polynomial  over  x  can  be  computed  as  a  linear  combination  of  the  efc(x)’s,  and 
thus  has  the  same  form  (with  different  A’s). 

By  itself,  Ben-Or’s  observation  is  not  helpful  to  us,  since  (until  now)  we  did  not  know  how  to 
bootstrap  unless  the  polynomial  degree  of  the  decryption  function  is  low.  Ben-Or’s  observation 
does  not  help  us  lower  the  degree  (it  actually  increases  the  degree).2  Our  insight  is  that  we  can 
evaluate  the  part  by  temporarily  working  with  a  MHE  scheme,  such  as  Elgamal  [E1G85].  We 
first  use  a  simple  trick  to  get  an  encryption  under  the  MHE  scheme  of  all  the  (aj  +  xn)  terms  in 
Ben-Or’s  circuit,  then  use  the  multiplicative  homomorphism  to  multiply  them,  and  finally  convert 
them  back  to  SWHE  ciphertexts  to  do  the  final  sum.  Conversion  back  from  MHE  to  SWHE  is 
done  by  running  the  MHE  scheme’s  decryption  circuit  homomorphically  within  the  SWHE  scheme, 
which  may  be  expensive.  However,  the  key  point  is  that  the  degree  of  the  translation  depends  only 
on  the  MHE  scheme  and  not  on  the  SWHE  scheme.  This  breaks  the  self- referential  requirement 
of  being  able  to  evaluate  its  own  decryption  circuit,  hence  obviating  the  need  for  the  squashing 
step.  Instead,  we  can  now  just  increase  the  parameters  of  the  SWHE  scheme  until  it  can  handle 
the  MHE  scheme’s  decryption  circuit. 

1.2  An  Illustration  of  an  Elgamal-Based  Instantiation 

Perhaps  the  simplest  illustration  of  our  idea  is  using  Elgamal  encryption  to  do  the  multiplication. 
Let  p  =  2q  +  1  be  a  safe  prime.  Elgamal  messages  and  ciphertext  components  will  live  in  QR (p), 
the  group  of  quadratic  residues  modulo  p.  We  also  use  a  SWHE  scheme  with  plaintext  space  Zp. 
(All  previous  SWHE  schemes  can  be  adapted  to  handle  this  large  plaintext  space) .  We  also  require 
the  SWHE  scheme  to  have  a  “simple”  decryption  function  that  can  be  expressed  as  a  “restricted” 
depth-3  arithmetic  circuit.  These  terms  are  defined  later  in  Section  2,  for  now  we  just  mention  that 
all  known  SWHE  schemes  [Gen09b,  vDGHVIO,  SV10,  GH11]  meet  this  condition 

For  simplicity  of  presentation  here,  imagine  that  the  SWHE  secret  key  is  a  bit  vector  s  = 
(s i, . . . ,  sn )  G  {0,  l}n,  the  ciphertext  that  we  want  to  decrypt  is  also  a  bit  vector  c  =  (ci, . . . ,  cn)  G 
{0,  l}n,  and  that  decryption  works  by  first  computing  xt  .sy  •  ct  for  all  i,  and  then  running 
the  El  S  circuit,  taking  x  as  input.  Imagine  that  decryption  simply  performs  something  like 
interpolation  -  namely,  it  computes  f(x)  =  )>++  Ay  IT"=i  +'  +  ay),  where  the  ay’s  and  Ay’s  are 
publicly  known  constants  in  Zp. 

To  enable  bootstrapping,  we  provide  (in  the  public  key)  the  Elgamal  secret  key  encrypted  under 
the  SWHE  public  key,  namely  we  encrypt  the  bits  of  the  secret  Elgamal  exponent  e  individually 
under  the  SWHE  scheme.  We  also  use  a  special  form  of  encryption  of  the  SWHE  secret  key  under 
the  Elgamal  public  key.  Namely,  for  each  secret-key  bit  Si  and  each  public  constant  aj,  we  provide 

2The  degree  of  P{z)  is  n ,  whereas  in  the  previous  blueprint  Gentry’s  squashing  technique  is  used  to  ensure  that 
the  Hamming  weight  of  x  is  at  most  m  n,  so  that  it  suffices  to  compute  ek(x)  only  for  k  <  m. 
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an  ElGamal  encryption  of  the  value  aj  +  Si  £  Zp.  The  public  values  ay’s  are  chosen  so  that  both 
ay,  ay  +  1  £  QR(p),  so  that  aj  +  Si  is  always  in  the  Elgamal  plaintext  space.3 

Now  let  c  £  {0,  l}n  be  a  SWHE  ciphertext  that  we  want  to  decrypt  homomorphically.  First, 
for  each  we  obtain  an  Elgamal  ciphertext  that  encrypts  aj  +  (s*  •  Cj)  as  follows:  if  a  =  0  then 

aj  +  (sj  •  Cj)  =  aj,  so  we  simply  generate  a  fresh  encryption  of  the  public  value  aj.  On  the  other 
hand,  if  c*  =  1  then  dj  +  (sj  •  q)  =  aj  +  Sj,  so  we  use  the  encryption  of  dj  +  sy  from  the  public  key. 
(Note  how  the  “restricted”  form  of  these  sums  dj  +  a:,;  makes  it  possible  to  put  in  the  public  key 
all  the  Elgamal  ciphertexts  that  are  needed  for  these  sums.) 

Next  we  use  Elgamal’s  multiplicative  homomorphism  for  the  part  of  the  circuit,  thus  getting 
Elgamal  encryptions  of  the  values  A j  •  _P(ay )  (where  P(z)  =  \\XZ  +  xi ))•  We  then  convert  these 
Elgamal  encryptions  into  SWHE  encryptions  of  the  same  values  in  Zp  by  homomorphically  eval¬ 
uating  the  Elgamal  decryption,  using  the  SWHE  encryption  of  the  Elgamal  secret  exponent  from 
the  public  key.  Denote  by  ej  the  z’th  bit  of  the  secret  exponent  e  (so  the  public  key  includes  an 
SWHE  encryption  of  ej),  and  let  ( y,z )  =  (gr ,m  ■  g~er)  be  an  Elgamal  ciphertext  to  be  converted. 
We  compute  y2  —  1  mod  p  for  all  i,  then  compute  SWHE  ciphertexts  that  encrypt  the  powers 

yei  2i  =  eiyr  +  (1  -  ei)y°  =  e* {y¥  -  1)  +  1, 

and  then  use  multiplicative  homomorphism  of  the  SWHE  scheme  to  multiply  these  powers  and 
obtain  an  encryption  of  ye.  (This  requires  degree  |~logg~|).  Finally,  inside  the  SWHE  scheme,  we 
multiply  the  encryption  of  ye  by  the  known  value  z,  thereby  obtaining  a  SWHE  ciphertext  that 
encrypts  m. 

At  this  point,  we  have  SWHE  ciphertexts  that  encrypt  the  results  of  the  operations  -  the 
values  Xj  ■  P(ay).  We  now  use  the  SWHE  scheme’s  additive  homomorphism  to  finish  off  the  depth- 
3  circuit,  thus  completing  the  homomorphic  decryption.  We  can  now  compute  another  MULT  or 
ADD  operation,  before  running  homomorphic  decryption  again  to  “refresh”  the  result,  ad  infinitum. 

As  explained  above,  using  this  approach  the  SWHE  scheme  only  needs  to  evaluate  polynomials 
that  are  slightly  more  complex  than  the  MHE  scheme’s  decryption  circuit.  Specifically,  for  Elgamal 
we  need  to  evaluate  polynomials  of  degree  2  [" log  q } .  We  can  use  any  of  the  prior  SWHE  schemes 
from  the  literature,  and  set  the  parameters  large  enough  to  handle  these  polynomials.  The  security 
of  the  resulting  leveled  FHE  scheme  is  based  on  the  security  of  its  component  SWHE  and  MHE 
schemes. 

We  also  show  that  by  a  careful  choice  of  the  constants  dj,  we  can  set  things  up  so  that  we 
always  have  P(ay)  =  Wj  ■  P(ai)ej  for  some  known  constants  ey ,  w3  £  Z p.  Hence  we  can  compute 
all  the  Elgamal  ciphertexts  at  the  output  of  the  n  layer  given  just  the  Elgamal  ciphertext  that 
encrypts  P(a i),  which  yields  a  compact  representation  of  the  ciphertext. 

1.3  Leveled  FHE  Based  on  Worst-Case  Hardness 

We  use  similar  ideas  to  get  a  leveled  FHE  scheme  whose  security  is  based  entirely  on  the  (quantum) 
worst-case  hardness  of  ideal-SIVP.  At  first  glance  this  may  seem  surprising:  how  can  we  use  a  lattice- 
based  scheme  as  our  MHE  scheme  when  current  lattice-based  schemes  do  not  handle  multiplication 
very  well?  (This  was  the  entire  reason  the  old  blueprint  required  squashing!)  We  get  around  this 

3 An  amusing  exercise:  Prove  that  the  number  of  a/s  with  aj,a,j  +  1  €  QR(p)  is  (p  —  3)/4  when  p  =  3  mod  4  and 
(p  —  5)/4  when  p  =  1  mod  4.  See  Lemma  5  for  the  answer. 


3 


1.  Fully  Homomorphic  Encryption  without  Squashing 


apparent  problem  by  replacing  the  MHE  scheme  with  an  additively  homomorphic  encryption  (AHE) 
scheme,  applied  to  discrete  logs. 

In  more  detail,  as  in  the  Elgamal-based  instantiation,  the  SWHE  scheme  uses  plaintext  space 
Zp  for  prime  p  =  2q  +  1.  But  p  is  chosen  to  be  a  small  prime,  polynomial  in  the  security  parameter, 
so  it  is  easy  to  compute  discrete  logs  modulo  p.  The  plaintext  space  of  the  AHE  scheme  is  Zg, 
corresponding  to  the  space  of  exponents  of  a  generator  g  of  Z*.  Rather  than  encrypting  in  the 
public  key  the  values  aj  +  S{  (as  in  the  Elgamal  instantiation),  we  provide  AHE  ciphertexts  that 
encrypt  the  values  DL g(aj  +  Sj)  €  Zq,  and  use  the  same  trick  as  above  to  get  AHE  ciphertexts 
that  encrypt  the  values  DL9(aj  +  (st  ■  Ci )).  We  homomorphically  add  these  values,  getting  an  AHE 
encryption  of  DL9(Aj  •  P(a9)).  Finally,  we  use  the  SWHE  scheme  to  homomorphically  compute 
the  AHE  decryption  followed  by  exponentiation,  getting  SWHE  encryption  of  the  values  A j  ■  P(dj), 
which  we  add  within  the  SWHE  scheme  to  complete  the  bootstrapping. 

As  before,  the  SWHE  scheme  only  needs  to  support  the  AHE  decryption  (and  exponentiation 
modulo  the  small  prime  p),  thus  we  don’t  have  the  self- reference  problem  that  requires  squashing. 
We  note,  however,  that  lattice-based  additively-homomorphic  schemes  are  not  completely  error 
free,  so  once  must  set  the  parameters  so  that  it  supports  sufficient  number  of  summands.  Since  the 
dependence  of  the  AHE  noise  on  the  number  of  summands  is  very  weak  (only  logarithmic),  this 
can  be  done  without  the  need  for  squashing.  See  Section  A. 3  for  more  details  on  this  construction. 

2  Decryption  as  a  Depth-3  Arithmetic  Circuit 

Recall  that,  in  Gentry’s  FHE,  we  “refresh”  a  ciphertext  c  by  expressing  decryption  of  this  cipher- 
text  as  a  function  Dc(s)  in  the  secret  key  s,  and  evaluating  that  function  homomorphically.  Below, 
we  describe  “restricted”  depth-3  circuits,  sketch  a  “generic”  lattice  based  construction  that  encom¬ 
passes  known  SWHE  schemes  (up  to  minor  modifications),  and  show  how  to  express  its  decryption 
function  Dc(s)  as  a  restricted  depth-3  circuit  over  a  large  enough  ring  Zp.  We  note  that  Klivans 
and  Sherstov  [KS06]  have  already  shown  that  the  decryption  functions  of  Regev’s  cryptosystems 
[Reg04,  Reg09]  can  be  computed  using  depth-3  circuits. 

2.1  Restricted  Depth-3  Arithmetic  Circuits 

In  our  construction,  the  circuit  that  computes  Dc(s)  depends  on  the  ciphertext  c  only  in  a  very 
restricted  manner.  By  “restricted”  we  roughly  mean  that  the  bottom  sums  in  the  depth-3  circuit 
must  come  from  a  fixed  (polynomial-size)  set  C  of  polynomials,  where  C  itself  is  independent  of  the 
ciphertext.  Thus,  the  bottom  sums  used  in  the  circuit  can  depend  on  the  ciphertext  only  to  the 
extent  that  the  ciphertext  is  used  to  select  which  and  how  many  of  the  polynomials  in  C  are  used 
as  bottom  sums  in  the  circuit. 

Definition  1  (Restricted  Depth-3  Circuit).  Let  C  =  {Lj{x\, . . . ,  xn)}  be  a  set  of  polynomials,  all 
in  the  same  n  variables.  An  arithmetic  circuit  C  is  an  ^-restricted  depth-3  circuit  over  (aq, . . . ,  xn) 
if  there  exists  multisets  Si, . . . ,  St  C  C  and  constants  Ao,  Ai, . . . ,  A*  such  that 

t 

C(x)  =  Ao  +  ^Aj-  Lj(x i,...,xn), 
i=  1  LjESi 

The  degree  of  C  with  respect  to  C  is  d  =  max,:  \Si\  (we  also  call  it  the  C-degree  of  C). 
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Remark  1.  In  all  our  instantiations  of  decryption  circuits  for  known  SWHE  schemes,  the  Lj’s 
happen  to  be  linear.  However,  our  generic  construction  in  Section  3  does  not  require  that  they  be 
linear  (or  even  low  degree). 

To  express  decryption  as  restricted  circuit  as  above,  we  use  Ben-Or’s  observation  that  multilinear 
symmetric  polynomials  can  be  computed  by  restricted  depth-3  arithmetic  circuits  that  perform 
interpolation.  Recall  that  a  multilinear  symmetric  polynomial  M (x)  is  a  symmetric  polynomial 
where,  for  each  i,  every  monomial  is  of  degree  at  most  1  in  xp,  there  are  no  high  powers  of  Xj. 
A  simple  fact  is  that  every  multilinear  symmetric  polynomial  M (x)  is  a  linear  combination  of  the 
elementary  symmetric  polynomials:  M[x)  =  )T)”=0 1%  ■  ej(x),  where  e,(x)  is  the  sum  of  all  degree-?' 
monomials  that  are  the  product  of  i  distinct  variables.  Also,  for  every  symmetric  polynomial  S(x), 
there  is  a  multilinear  symmetric  polynomial  M(x)  that  agrees  with  S(x)  on  all  binary  vectors 
x  G  {0,1}.  The  reason  is  that  x\  =  x\  for  Xi  G  {0,1},  and  therefore  all  higher  powers  in  S(x) 
can  be  “flattened”;  the  end  result  is  multilinear  symmetric.  The  following  lemma  states  Ben-Or’s 
observation  formally. 

Lemma  1  (Ben-Or,  reported  in  [NW97]).  Let  p  >  n  +  1  be  a  prime,  let  A  C  Zp  have  cardinality 

def 

n  +  l,  let  x  =  (x\, . . . ,  xn)  be  variables,  and  denote  La  =  {(a  +  xf)  :  a  G  A,  1  <  i  <  n}.  For  every 
multilinear  symmetric  polynomial  M(x)  overZp,  there  is  a  circuit  C(x)  such  that: 

•  C  is  a  LA-restricted  depth-3  circuit  over  Zp  such  that  C(x)  =  M(x)  (in  hp). 

•  C  has  n  +  1  product  gates  of  La- degree  n,  one  gate  for  each  value  aj  G  A,  with  the  j  ’th  gate 
computing  the  value  A  j  ■  P{af)  =  Wfa^  +  xf)  for  some  scalar  A  j. 

•  A  description  of  C  can  be  computed  efficiently  given  the  values  M(x)  at  all  x  =  F0n_*. 

The  final  bullet  clarifies  that  Ben-Or’s  observation  is  constructive  -  we  can  compute  the  re¬ 
stricted  depth-3  representation  from  any  initial  representation  that  lets  us  evaluate  M.  For  com¬ 
pleteness,  we  prove  Lemma  1  in  Appendix  B. 

In  some  cases,  it  is  easier  to  work  with  univariate  polynomials.  The  following  fact,  captured 
in  Lemma  2,  will  be  useful  for  us:  Suppose  f(x)  is  an  arbitrary  univariate  function  and  we  want 
to  compute  fifffbi  ■  tf),  where  the  bfs  are  bits  and  the  tfs  are  small  (polynomial).  Then,  we  can 
actually  express  this  computation  as  a  multilinear  symmetric  polynomial,  and  hence  a  restricted 
depth-3  circuit  in  the  bfs. 

Lemma  2.  Let  T,n  be  positive  integers,  and  f(x)  a  univariate  polynomial  over  Zp  (for  p  prime, 
p  >  Tn  +  1).  Then  there  is  a  multilinear  symmetric  polynomial  Mf(- )  on  Tn  variables  such  that 
for  all  ti, . . .  ,tn  G  {0, . . .  ,T}, 


f(bi  ■  h  H - 1-  bn  ■  tn)  =  Mf(bi, . . . ,  h, 

V - V - y 

t\  times 


T—t\  times  t2  times 


T—t2  times 


tn  times  T—tn  times 


for  all  b  G  {0,  l}n.  Moreover,  a  representation  of  Mf  as  a  LA-restricted  depth-3  circuit  can  be 
computed  in  time  poly(Tn)  given  oracle  access  to  f. 

Proof.  Define  a  Tn-variate  polynomial  g  :  Zjn  — >  Zp  as  g(x)  =  /QZ  Xj),  then  g  is  symmetric  and 
we  have 


f(b  i  •  t\  4 - \-bn-tn)  =  g(b\, . . . ,  b\,  0,  ■  ,  0  ,b2,...,b2^  0,  ■  ^  ,  0  ,  •  •  • 

t\  times  T—t \  times  ti  times  T—ti  times 


(n  i  •  -j  ;  bn^  0,  ■  ,  0  )• 

tn  times  T—tn  times 
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As  noted  above,  there  is  a  multilinear  symmetric  polynomial  Mf(x)  that  agrees  with  g(x )  on  all 
0-1  inputs,  By  Lemma  1,  for  any  A  C  7Lq  of  size  Tn  +  1  we  can  compute  an  /^-restricted  depth-3 
circuit  representation  of  Mf(x)  by  evaluating  g(x )  over  the  vectors  x  =  l*0Tn_*,  which  can  be  done 
using  the  /-oracle.  □ 

2.2  Lattice-Based  Somewhat-Homomorphic  Cryptosystems 

In  GGH-type  [GGH97]  lattice-based  encryption  schemes,  the  public  key  describes  some  lattice  L  C 
Mn  and  the  secret  key  is  a  rational  matrix  S  G  Qnxn  (related  to  the  dual  lattice  L*).  In  the 
schemes  that  we  consider,  the  plaintext  space  is  Zp  for  a  prime  p.  and  an  encryption  of  m  is  a 
vector  c  =  v  +  e  G  Zn,  where  v  G  L  and  e  is  a  short  noise  vector  satisfying  e  =  m  (mod  p).  It 
was  shown  in  [Gen09a]  that  decryption  can  be  implemented  by  computing  rh  <—  c  —  \c-  >S'J  modp, 
where  [-J  means  rounding  to  the  nearest  integer.  Moreover  the  parameters  can  be  set  to  ensure 
that  ciphertexts  are  close  enough  to  the  lattice  so  that  the  vector  c  •  S'  is  less  than  1/2 (AI  +  1)  away 
from  Zn. 

Somewhat  similarly  to  [Gen09b],  such  schemes  can  be  modified  to  make  the  secret  key  a  bit 
vector  s  G  {0,1}^,  such  that  S  =  YliLi  si  '  Ti  with  the  T/s  public  matrices.  For  example,  the 
Si  s  could  be  the  bit  description  of  S  itself,  and  then  each  Tfs  has  only  a  single  nonzero  entry, 
of  the  form  or  2_-?  (for  as  many  different  values  of  j  as  needed  to  describe  S  with  sufficient 
precision).  Differently  from  [Gen09b],  the  Tfs  in  our  setting  contain  no  secret  information  -  in 
particular  we  do  not  require  a  sparse  subset  that  sums  up  to  S.  The  ciphertext  c  from  the  original 
scheme  is  post-processed  to  yield  (c,  {ftj}(/=1)  where  Ui  =  c  •  T% ,  and  the  decryption  formula  becomes 

m  <—  c  —  si  '  Ui  mod  p. 

Importantly,  the  coefficients  of  the  u’s  are  output  with  only  k  =  (log (A7  +  1)]  bits  of  precision 
to  the  right  of  the  binary  point,  just  enough  to  ensure  that  the  rounding  remains  correct  in  the 
decryption  formula.  For  simplicity  hereafter,  we  will  assume  that  the  plaintext  vector  is  m  = 
(0, . . . ,  0,  m)  -  i.e.,  it  has  only  one  nonzero  coefficient.  Thus,  the  post-processed  ciphertext  becomes 
(c,  {uj})  (numbers  rather  than  vectors). 

2.3  Decryption  Using  a  Restricted  Depth-3  Circuit 

For  the  rest  of  this  section,  the  details  of  the  particular  encryption  scheme  8  are  irrelevant  except 
insofar  as  it  has  the  following  decryption  formula:  The  secret  key  is  s  G  {0, 1}^,  and  the  ciphertext 
is  post-processed  to  the  form  (c,  {uj}),  and  each  u%  is  split  into  an  integer  part  and  a  fractional 
part,  Ui  =  u'i9u'l,  such  that  the  fractional  part  has  only  k  =  [log(Ar  +  1)]  bits  of  precision  (namely, 
u'l  is  a  K-bit  integer).  The  plaintext  is  recovered  as: 

in  c  —  Si  ■  u'i—  2-K  •  Si  ■  u'l  modp.  (2) 

v - v - /  V - V - ' 

“simple  part”  “complicated  part” 

We  now  show  that  we  can  compute  Equation  (2)  using  a  £u-restricted  circuit. 

Lemma  3.  Let  p  be  a  prime  p  >  2N2 .  Regarding  the  “ complicated  part”  of  Equation  (2),  there  is 
a  univariate  polynomial  f(x)  of  degree  <  2N2  such  that  f(fE  Si  ■  u'f)  =  [ 2~K  ■  ^  s,  ■  u'f  J  mod  p. 

Proof.  Since  p  >  2N2,  there  is  a  polynomial  /  of  degree  at  most  2N 2  such  that  f(x)  =  \2~K  ■  ,xj  mod 
p  for  all  x  G  [0,  2A^2] .  The  lemma  follows  from  the  fact  that  ^  Si  ■  u'f  G  [0,  AT  -  ( 2K  —  1)]  C  [0,  2N2] .  □ 
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Theorem  1.  Let  p  be  a  prime  p  >  2N2 .  For  any  A  C  Zp  of  cardinality  at  least  2N 2  +  1, 
£’s  decryption  function  (Equation  (2))  can  be  efficiently  expressed  as  and  computed  using  a  LA- 
restricted  depth-3  circuit  C  of  La- degree  at  most  2N 2  having  at  most  2N 2  +  N  +  1  product  gates. 


Proof.  First,  consider  the  “complicated  part”.  By  Lemma  3,  there  is  a  univariate  polynomial  f{x) 
of  degree  2N 2  such  that  /(E  s*  •  u'[)  =  |~2_K  -J2si'  ui\  m°d  P-  Since  all  u"  €  {0, . . . ,  2IV}j  by 
Lemma  2,  there  is  a  multilinear  symmetric  polynomial  Mf{x)  taking  2N 2  inputs  such  that 


2  N-ul 


N 


nq2N-u'L^ 


for  all  s  €  {0, 1}^,  and  moreover  we  can  efficiently  compute  Mf  s  representation  as  a  /^-restricted 
depth-3  circuit  C.  By  Lemma  1,  C  has  T^-degree  at  most  2 N2  and  has  at  most  2N2  +  1  product 
gates.  We  have  proved  the  theorem  for  the  complicated  part.  To  handle  the  “simple  part”  as  an 
/1,4-restricted  circuit,  we  can  re-write  it  as  (c  +  a\  ■  E  ui)  ~~  E(°i  +  sf)  ■  u[  mod  p  with  the  constant 
term  Ao  =  (c  +  ai  •  The  circuit  for  the  simple  part  has  T^-degree  1  and  N  “product” 

gates.  □ 


In  Section  4.2,  we  show  how  to  tweak  the  “generic”  lattice-based  decryption  further  to  allow  a 
purely  multilinear  symmetric  decryption  formula.  (Above,  only  the  complicated  part  is  multilinear 
symmetric.)  While  not  essential  to  construct  leveled  FHE  schemes,  this  tweak  enables  interesting 
optimizations.  For  example,  in  4.1  we  show  that  we  can  get  a  very  compact  leveled  FHE  ciphertext 
-  specifically,  at  one  point,  it  consists  of  a  single  MHE  ciphertext  -  e.g.,  a  single  Elgamal  ciphertext! 
This  single  MHE  ciphertext  encrypts  the  value  P(ai),  and  we  show  how  (through  a  clever  choice 
of  af  s)  to  derive  MHE  ciphertexts  that  encrypt  P(a,i )  for  the  other  Ls. 


3  Leveled  FHE  from  SWHE  and  MHE 

Here,  we  show  how  to  take  a  SWHE  scheme  that  has  restricted  depth-3  decryption  and  a  MHE 
scheme,  and  combine  them  together  into  a  “monstrous  chimera”  [Wikll]  to  obtain  leveled  FHE.  The 
construction  works  much  like  the  Elgamal-based  example  given  in  the  Introduction.  That  is,  given 
a  SWHE  ciphertext,  we  “recrypt”  it  by  homomorphically  evaluating  its  depth-3  decryption  circuit, 
pre-processing  the  first  level  of  linear  polynomials  Lj(s )  (where  s  is  the  secret  key)  by  encrypting 
them  under  the  MHE  scheme,  evaluating  the  products  under  the  MHE  scheme,  converting  MHE 
ciphertexts  into  SWHE  ciphertexts  of  the  same  values  by  evaluating  the  MHE’s  scheme’s  decryption 
function  under  the  SWHE  scheme  using  the  encrypted  MHE  secret  key,  and  finally  performing  the 
final  sum  (an  interpolation)  under  the  SWHE  scheme.  The  SWHE  scheme  only  needs  to  be  capable 
of  evaluating  the  MHE  scheme’s  decryption  circuit,  followed  by  a  quadratic  polynomial.  Contrary  to 
the  old  blueprint,  the  required  “homomorphic  capacity”  of  the  SWHE  scheme  is  largely  independent 
of  the  SWHE  scheme’s  decryption  function. 

3.1  Notations 

Recall  that  an  encryption  scheme  £  =  (KeyGen,  Enc,  Dec,  Eval)  with  plaintext  space  V  is  somewhat- 
homomorphic  (SWHE)  with  respect  to  a  class  F  of  multivariate  functions4  over  V ,  if  for  every 

4The  class  T  may  depend  on  the  security  parameter  A. 
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f(x i, . . . ,  xn)  G  F  and  every  m i, . . . ,  mn  G  V,  it  holds  (with  probability  one)  that 

Dec(s/c,  Eval(pfe,  /,  ci,...,cn))  = 

where  ( sk,pk )  are  generated  by  KeyGen(lA)  and  the  c/s  are  generated  as  ct  <—  Enc (pk,rrii).  We 
refer  to  F  as  the  “homomorphic  capacity”  of  £.  We  say  that  £  is  multiplicatively  (resp.  additively) 
homomorphic  if  all  the  functions  in  F  are  naturally  described  as  multiplication  (resp.  addition). 

Given  the  encryption  scheme  £,  we  denote  by  Cg(pk)  the  space  of  “freshly-encrypted  ciphertexts” 
for  the  public  key  pk,  namely  the  range  of  the  encryption  function  for  this  public  key.  We  also 
denote  by  Cg  the  set  of  freshly-encrypted  ciphertexts  with  respect  to  all  valid  public  keys,  and 
by  Cg.T  the  set  of  “evaluated  ciphertexts”  for  a  class  of  functions  F  (i.e.  those  that  are  obtained 
by  evaluating  homomorphically  a  function  from  F  on  ciphertexts  from  Cg).  That  is  (for  implicit 
security  parameter  A), 

Cg  d=  IJ  Cg{pk ),  Cgj  =f  {Eval(pfc,  /,  c)  :  pk  G  KeyGen,  /  G  F,  c<ECg(pk)} 

p/cE  KeyGen 

3.2  Compatible  SWHE  and  MHE  Schemes 

To  construct  “chimeric”  leveled  FHE,  the  component  SWHE  and  MHE  schemes  must  be  compatible: 
Definition  2  (Chimerically  Compatible  SWHE  and  MHE).  Let  SWEIE  be  an  encryption  scheme 
with  plaintext  space  Zp,  which  is  somewhat  homomorphic  with  respect  to  some  class  F .  Let  MEIE 
be  a  scheme  with  plaintext  space  V  C  Zp,  which  is  multiplicatively  homomorphic  with  respect  to 
another  F' . 

We  say  that  SWEIE  and  MHE  are  chimerically  compatible  if  there  exists  a  polynomial- size  set 
C  =  {Lj}  of  polynomials  and  polynomial  bounds  D  and  B  such  that  the  following  hold: 

•  For  every  ciphertext  c  G  Cswhe.j7-  the  function  Vc(sk )  =  SWHE.Dec(.sfc,  c)  can  be  evaluated 
by  an  C-restricted  circuit  over  Zp  with  C- degree  D.  Moreover,  an  explicit  description  of  this 
circuit  can  be  computed  efficiently  given  c. 

•  For  any  secret  key  sk  G  SWHE. KeyGen  and  any  polynomial  Lj  G  C  we  have  Lj(sk)  G  V .  I.e., 
evaluating  Lj  on  the  secret  key  sk  lands  us  in  the  plaintext  space  of  MHE. 

•  The  homomorphic  capacity  F'  of  MHE  includes  all  products  of  D  or  less  variables. 

•  The  homomorphic  capacity  o/SWHE  is  sufficient  to  evaluate  the  decryption  o/MHE  followed 
by  a  quadratic  polynomial  (with  polynomially  many  terms)  overZ,p.  Formally,  the  number  of 
product  gates  in  all  the  C-restricted  circuits  from  the  first  bullet  above  is  at  most  the  bound  B, 
and  for  any  two  vectors  of  MHE  ciphertexts  c  =  (c\, . . .  cfi  and  <£  =  (c\ , . . .  G  p,, 
the  two  functions 

b  b' 

DAdd  g^(sk)  "  £MHE  .D ec(sk,  Ci )  +  £  MHE  .Dec(.sfc,  c')  mod  p 

i= 1  i= 1 

b  b' 

DMul^cKs/c)  =  ( £  MHE  .Dec(s/c,  Ci )  )(£mhe  .Dec(sk,  mod  p 

2—1  i=  1 

are  within  the  homomorphic  capacity  o/SWHE  -  i.e.,  DAdd^g/s&O,  DMul^els/c)  €E  F . 
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We  note  that  the  question  of  whether  two  schemes  are  compatible  may  depend  crucially  on  the 
exact  representation  of  the  secret  keys  and  ciphertexts  in  both.  Consider  for  example  our  Elgamal 
instantiation  from  the  introduction.  While  a  naive  implementation  of  exponentiation  would  have 
exponential  degree,  certainly  too  high  to  be  evaluated  by  any  known  SWHE  scheme,  we  were  able 
to  post-process  the  Elgamal  ciphertext  so  as  to  make  the  degree  of  decryption  more  manageable. 

We  also  note  that  we  can  view  “additively-homomorphic  encryption  of  discrete  logarithms”  as 
a  particular  type  of  multiplicative-homomorphic  scheme,  where  encryption  include  taking  discrete- 
logarithm  (assuming  that  it  can  be  done  efficiently)  and  decryption  includes  exponentiation. 


3.3  Chimeric  Leveled  FHE:  The  Construction 


Let  SWHE  and  MHE  be  chimerically  compatible  schemes.  We  construct  a  leveled  FHE  scheme  as 
follows: 

FHE.KeyGen(A,£):  Takes  as  input  the  security  parameter  A  and  the  number  of  circuit  levels  l  that 
the  composed  scheme  should  be  capable  of  evaluating.  For  i  6  [!,£],  run 


i^pk 


(*) 

SW ’  bKSW 


R 


SWHE.KeyGen  ,  ( pk 


(0  ,jL(i) 

MH’  bhjMH 


R 


MHE.KeyGen  . 


Encrypt  the  i’th  MHE  secret  key  under  the  (i-f-l)’st  SWHE  public  key,  sk$H  SWHE.Enc(pfe^^., 
sk mh )■  Also  encrypt  the  i’th  SWHE  secret  key  under  the  i’th  MHE  public  key,  but  in  a  particular 
format  as  follows:  Recall  that  there  is  a  polynomial-size  set  of  polynomials  C  such  that  SWHE 
decryption  can  be  computed  by  ^-restricted  circuits.  To  encrypt  sk^w  under  pk^JH,  compute 
rriij  Lj(skgyy)  for  all  Lj  G  £,  and  then  encrypt  it  cVJ  MHE.En c(pk^J H ,  mij) .  Let  sk^w  denote 
the  collection  of  all  the  c^-’s.  The  public  key  pkFH  consists  of  {pk^w , pk^J H)  and  the  encrypted 
secret  keys  (skgly,  sk^}H)  for  all  i.  The  secret  key  skFu  consists  of  skFW  for  all  i. 

FHE.Enc {pkFHlm)\  Takes  as  input  the  public  key  pkFH  and  a  message  in  the  plaintext  space  of 
the  SWHE  scheme.  It  outputs  SWHE.Enc(pfc^,  m). 

FHE.Dec(sfci?//,  c):  Takes  as  input  the  secret  key  skFu  and  a  SWHE  ciphertext.  Suppose  the 
ciphertext  is  encrypted  under  pk^w.  It  is  decrypted  directly  using  SWHE.Dec(s/c^,  c). 

FHE.Recrypt(pfcp^, c):  Takes  as  input  the  public  key  and  a  ciphertext  c  that  is  a  valid  “evaluated 

(i) 

SWHE  ciphertext”  under  pkg^y,  and  outputs  a  “refreshed”  SWHE  ciphertext  c  ,  encrypting  the 
same  plaintext  but  under  pk^}\  It  works  as  follows: 

Circuit-generation.  For  a  SWHE  ciphertext  c,  generate  a  description  of  the  £-restricted  circuit 
C  over  Zp  that  computes  the  decryption  of  c.  Denote  it  by 


Cc(sk)  —  Xq  +  E  ^ k  n  Lj(sk)  mod  p 

k= 1  LjSSfc 


(=  SWHE.Dec(sfc,c)) 


(i) 

Products.  Pick  up  from  the  public  key  the  encryptions  under  the  MHE  public  key  pk)JH  of  the 
values  Lj(skgyy)-  Use  the  homomorphism  of  MHE  to  compute  MHE  encryptions  of  all  the  terms 
A k  ■  ULj  eSfe  Ljiskgw).  Denote  the  set  of  resulting  MHE  ciphertexts  by  ci, . . . ,  c*. 
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Translation.  Pick  up  from  the  public  key  the  encryption  under  the  SWHE  public  key  pkgW  of  the 

(i) 

MHE  secret  key  sk)JH.  For  each  MHE  ciphertext  c*  from  the  Products  step,  use  the  homomorphism 
of  SWHE  to  evaluate  the  function  DCi(sk )  =  MHE.Dec(s£i,  Cj)  on  the  encrypted  secret  key.  The 
results  are  SWHE  ciphertexts  dl , . . .  dt ,  where  c'-  encrypts  the  value  A k  ■  Ur  :eskLj(sksw )  under 

„i.(*+i) 

PhSW  ■ 

Summation.  Use  the  homomorphism  of  SWHE  to  sum  up  all  the  dj  s  and  add  Ao  to  get  a  ciphertext 
c*  that  encrypts  under  pk^P  the  value 


Ao  +  Z^ 

k=  1 


t 

Afc  Lj(sk^w)  mod  p  =  SWHE.Dec/sfc^,,  c) 

Lj^Sk 


Namely,  c*  encrypts  under  pk^P  the  same  value  that  was  encrypted  in  c  under  pk^w- 

FHE.Add(pfcp#, ci, C2)  and  FHE.Mult(p/c^, ci, C2):  Take  as  input  the  public  key  and  two  cipher- 

(i) 

texts  that  are  valid  evaluated  SWHE  ciphertexts  under  pks\y.  Ciphertexts  within  the  SWHE 
scheme  (at  any  level)  may  be  added  and  multiplied  within  the  homomorphic  capacity  of  the  SWHE 
scheme.  Once  the  capacity  is  reached,  they  can  be  recrypted  and  then  at  least  one  more  operation 
can  be  applied. 

Theorem  2.  //SWHE  and  MHE  are  chimerically  compatible  schemes,  then  the  above  scheme  FHE 
is  a  leveled  FHE  scheme.  Also,  if  both  SWHE  and  MHE  are  semantically  secure,  then  so  is  FHE. 
Correctness  follows  in  a  straightforward  manner  from  the  definition  of  chimerically  compatible 
schemes.  Security  follows  by  a  standard  hybrid  argument  similar  to  Theorem  4.2.3  in  [Gen09a]. 
We  omit  the  details. 

4  Optimizations 

In  the  Products  step  of  the  Recrypt  process  (see  Section  3),  we  compute  multiple  products  horno- 
morphically  within  the  MHE  scheme.  In  Section  4.1,  we  provide  an  optimization  that  allows  us 
to  compute  only  a  single  product  in  the  Products  step.  In  Section  4.2,  we  extend  this  optimiza¬ 
tion  so  that  the  entire  leveled  FHE  ciphertext  after  the  Products  step  can  consist  of  a  single  MHE 
ciphertext. 

4.1  Computing  Only  One  Product 

For  now,  let  us  ignore  the  “simple  part”  of  our  decryption  function  (Equation  2),  which  is  linear 
and  therefore  does  not  involve  any  “real  products”. 

The  products  in  the  “complicated  part”  all  have  a  special  form.  Specifically,  by  Theorem  1  and 
the  preceding  lemmas,  for  secret  key  s  G  {0, 1}^,  ciphertext  (c,  {itj}),  set  lcZp  with  |H[  >  2 N2, 
and  fixed  scalars  {Ay}  associated  to  a  multilinear  symmetric  polynomial  Mf,  the  products  are  all 
of  the  form  A j  ■  P(aj )  for  all  a  €  A,  where 

P(z)  =  H(z  +  Si)<  ■  (z  +  0)2N~<  . 
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We  will  show  how  to  choose  the  ay’s  so  that  we  can  compute  P(cij)  for  all  j  given  only  P(ai).  This 
may  seem  surprising,  but  observe  that  the  P(ay)’s  are  highly  redundant.  Namely,  if  we  consider 
the  integer  v  =  ^  1  u "  (which  is  at  most  2IV2),  then  we  have 

Pia,)  =  (a1  +  ir-(aJ+0)2N2-. 

Knowing  a\,  the  value  of  P(ai)  contains  enough  information  to  deduce  v,  and  then  knowing  a.j 
we  can  get  P(aj)  for  all  j.  To  be  able  to  compute  the  P(aj),s  efficiently  from  P(ai),  we  choose  the 
dj' s  so  that  for  all  j  >  1  we  know  integers  (wj,  ey)  such  that: 

aj  =  Wj  ■  a[J  and  a j  +  1  =  Wj  ■  (01  +  l)6-7 . 

We  store  ( Wj,ej )  in  the  public  key,  and  then  compute  P(ay)  =  w2N~  ■  P(a\)eE 

Importantly  for  our  application  to  chimeric  FHE,  we  can  compute  an  encryption  of  P(cij)  from 
an  encryption  of  P(«i)  within  the  MHE  scheme  -  simply  use  the  multiplicative  homomorphism  to 
exponentiate  by  ey  (using  repeated  squaring  as  necessary)  and  then  multiply  the  result  by  w2N" . 

Generating  suitable  tuples  ( ay,tny,ey )  for  j  >  1  from  an  initial  value  a\  is  straightforward: 
We  choose  the  ey ’s  arbitrarily  and  then  solve  for  the  rest.  Namely,  we  generate  distinct  ey’s, 
different  from  0,1,  then  set  ay  <—  a1J/((ai  +  l)ei  —  a,7 )  and  Wj  =  aj/ax3 .  Observe  that  aj  +  1  = 
(ai  +  l)eV((oi  +  l)6-7  —  a\3)  -  i.e.,  the  ratio  (aj  +  1  )/aj  =  ((a\  +  l)/ai)eR  as  required. 

Some  care  must  be  taken  to  ensure  that  the  values  aj ,  a j  +  1  are  in  plaintext  space  of  the  MHE 
scheme  -  e.g.,  for  Elgamal  they  need  to  be  quadratic  residues.  Recall  the  basic  fact  that  for  a  safe 
prime  p  there  are  ( p  —  3) / 4  values  a  for  which  a,  a  +  1  E  QR (p)  (see  Lemma  5).  Therefore,  finding 
suitable  ai,ai  +  1  E  QR (p)  is  straightforward.  Since  a^ ,  (a i  +  l)eJ  E  QR (p),  we  have 

aj,aj  +  1  E  QR (p)  (ai  +  l)ej  -  a^7  E  QR (p)  ^  ((ai  +  l)/ai)ej  -  1  E  QR (p). 

If  (a\  +  l)/ai  generates  QR (p)  (which  is  certainly  true  if  p  is  a  safe  prime),  then  (re-using  the 
basic  fact  above)  we  conclude  that  aj ,  aj  +  1  E  QR (p)  with  probability  approximately  1/2  over  the 
choices  of  tj. 

Observe  that  the  amount  of  extra  information  needed  in  the  public  key  is  small.  The  ej's  need 
not  be  truly  random  -  indeed,  by  an  averaging  argument  over  the  choice  of  a\,  one  will  quickly 
find  an  a\  for  which  suitable  e/s  are  0(l)-dense  among  very  small  integers.  Hence  it  is  sufficient 
to  add  to  the  public  key  only  O(logp)  bits  to  specify  a\. 

4.2  Short  FHE  Ciphertexts:  Decryption  as  a  Pure  Symmetric  Polynomial 

Here  we  provide  an  optimization  that  allows  us  to  compress  the  entire  leveled  FHE  ciphertext 
down  to  a  single  MHE  ciphertext  -  e.g.,  a  single  Elgamal  ciphertext!  (The  optimization  above  only 
compresses  only  representation  of  the  “complicated  part”  of  Equation  2,  not  the  “simple  part”.) 
Typically,  a  MHE  ciphertext  will  be  much  much  shorter  than  a  SWHE  ciphertext:  a  few  thousand 
bits  vs.  millions  of  bits. 

The  main  idea  is  that  we  do  not  need  the  full  ciphertext  (c,  {;u'},  {«/})  to  recover  m  if  we  know 
a  priori  that  m  is  in  a  small  interval  -  e.g.,  m  E  {0, 1}.  Rather,  we  can  choose  a  “large-enough” 
polynomial-size  prime  r,  so  that  we  can  recover  m  just  from  ([c]r,  {['«(],.},  {['«/],■}),  where  [x]r  denotes 
x  mod  r  E  {0, . . .  ,r  —  1}.  Moreover,  after  reducing  the  ciphertext  components  modulo  r,  we  can 
invoke  Lemma  2  to  represent  decryption  as  a  purely  multilinear  symmetric  polynomial,  whose 
output  after  the  product  step  can  be  represented  by  a  single  product  P(a\)  (like  the  complicated 
part  in  the  optimization  of  Section  4.1). 
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Lemma  4.  Let  prime  p  =  u(N2).  There  is  a  prime  r  =  0(N )  and  a  univariate  polynomial  f(x) 
of  degree  0(N2)  such  that,  for  all  ciphertexts  (c,  {it'},  {u'(})  that  encrypt  m  €  {0,1},  we  have 
m  =  f(tr )  mod  p  where 

tr  =  [2"  •  c]r  +  •  [-2"  •  u'i  ~  u'l]r.  (3) 


Proof.  Let  t  =  2K(c  —  E  si  '  u'i)  —  E  si '  u'l-  The  original  decryption  formula  (Equation  2)  is 
m  =  c  —  ^2  si  '  u'i  ~  L2_K  •  Si  ■  u'l~\  =  [2_K  •  t]  mod  p 

Thus,  m  can  be  recovered  from  t.  Since  there  are  only  2  possibilities  for  m,  the  (consecutive) 
support  of  t  has  size  2K+1  =  O(N).  Set  r  to  be  a  prime  >  2K+1.  Since  the  mapping  x  i->-  [,x]r  has 
no  collisions  over  the  support  of  t,  t  can  be  recovered  from  [t]r.  Note  that  [t]r  =  [tr\r.  Thus  m  can 
be  recovered  from  tr  (via  [tr\r  =  [t]r,  then  t ).  Since  there  are  0(N  ■  r)  =  0(N2)  possibilities  for  tr, 
the  lemma  follows.  □ 

Theorem  3.  Let  prime  p  =  u(N2).  There  is  a  prime  r  =  0(N )  and  a  multilinear  symmetric 
polynomial  M  such  that,  for  all  “hashed”  ciphertexts  ([2K  •  c]r,{[— 2K  ■  u(  —  u”]r})  that  encrypt 
m  e  {0, 1},  we  have 


m  =  M(  1,...,1,0,  ...,0, ...  si,...,si  ,  0, . . . ,  0 


sn,  ■ 


,  sjv  ,  0, . . . ,  0  )  mod  p 


[2re-c]r  r— [2  K-c}r 


[-2 K-u'1-u"]r  r-[-2K-u'1-u'(\r  [-2 K-u'N-u'fl\r  r- [-2K -u'N -u'f]r 


Proof.  This  follows  easily  from  Lemmas  4  and  2.  □ 

Thus,  decryption  can  be  turned  into  a  purely  multilinear  symmetric  polynomial  M  whose 
product  gates  output  A j  ■  P{aj )  (for  known  ciphertext-independent  A/s),  where  P(z)  is  similar 
to  the  polynomial  described  in  Section  4.1.  Using  the  optimization  of  Section  4.1,  we  can  compress 
the  entire  leveled  FHE  ciphertext  down  to  a  single  MHE  ciphertext  that  encrypts  P(ai). 
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A  Instantiations  of  Chimeric  FHE 

A.l  The  Homomorphic  Capacity  of  SWHE  Schemes 

Our  instantiations  are  mildly  sensitive  to  the  tradeoff  between  the  parameters  of  the  SWHE  scheme 
and  its  homomorphic  capacity.  Recall  that  when  used  with  plaintext  space  Zp,  the  SWHE  schemes 
that  we  consider  have  secret  key  s  G  {0, ljA  and  decryption  formula5  for  post-processed  ciphertexts 
(c,  {«'},{<}): 

N  N 

m  =  c  +  ^  Si  ■  u't  +  2~K  ■  ^  Si  ■  u'l  mod  p.  (4) 

i= 1  i= 1 

with  k  =  [log  (IV  +  1)] ,  v!i  G  Z p  and  u"  G  {0, 1, . . . ,  2K}.  Below  we  say  that  a  scheme  has  threshold- 
type  decryption  if  it  has  this  decryption  formula. 

We  are  interested  in  the  tradeoff  between  the  number  N  of  secret-key  bits  and  the  degree 
of  the  polynomials  that  the  scheme  can  evaluate  homomorphically.  For  our  instantiations,  we 
only  need  the  number  of  key-bits  to  depend  polynomially  on  the  degree.  Specifically,  we  need  a 
polynomial  bound  K .  such  that  the  scheme  with  plaintext  space  Zp  with  security  parameter  A  can 
be  made  to  support  a-degree  polynomials  with  up  to  2 ^  terms  using  secret  keys  of  no  more  than 
N  =  K(X,  logp,  a,  0)  bits. 

Below  we  say  that  a  SWHE  scheme  is  “ homomorphic  for  low-degree  polynomials”  if  it  has  a 
polynomial  bound  on  the  key-size  as  above.  It  can  be  verified  that  all  the  known  lattice-based 
SWHE  schemes  meet  this  condition. 

A. 2  Elgamal-based  Instantiation 

In  the  Introduction,  we  specified  (in  a  fair  amount  of  detail)  an  instantiation  of  chimeric  leveled 
FHE  that  uses  Elgamal  as  the  MHE  scheme.  Here,  we  provide  a  supporting  lemmas  and  theorems 
to  show  that  Elgamal  is  chimerically  compatible  with  known  SWHE  schemes,  as  needed  for  the 
chimeric  combination  to  actually  work. 

Theorem  4.  Let  p  =  2q  +  1  he  a  safe  prime  such  that  DDH  holds  in  QR (p),  and  let  SWHE  he 
an  encryption  scheme  with  message  space  Zp,  which  is  homomorphic  for  low-degree  polynomials 
and  has  threshold-type  decryption.  Then  SWHE  is  chimerically  compatible  with  Elgamal  encryption 
modulo  p  over  plaintext  space  QR(p) . 

Proof.  Denote  the  security  parameter  by  A  and  let  a  =  poly(A)  be  another  parameter  (to  be  set 
later)  governing  the  degree  of  polynomials  that  can  be  homomorphically  evaluated  by  the  scheme. 

sThis  formula  differs  from  Equation  (2)  in  that  we  add  rather  than  subtract  the  sums.  This  change  was  done  to 
simplify  notations  in  some  of  the  arguments  below,  and  it  entails  only  a  slight  modification  of  the  scheme. 
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The  scheme  SWHE  can  then  be  set  to  support  polynomials  of  degree  up  to  a  having  at  most  2“ 
terms,  using  a  secret  key  s  £  {0, 1}^  of  size  N  =  A' (A,  logp,  a)  for  a  polynomial  A,  with  decryption 
formula  Equation  (4).  Since  p  must  be  super  polynomial  in  A  (for  DDH  to  hold),  then  in  particular 
2 N2  <  p  and  we  can  use  Theorem  1. 

We  thus  conclude  that  the  for  any  A  C  Zp  of  cardinality  2 N2  +  1,  given  a  SWHE  ciphertext 
(c,  {u'j} ,  {u"})  we  can  compute  efficiently  a  ^-restricted  depth-3  circuit  C  of  A^-degree  at  most 
2  N2  and  at  most  2  N2  +  N  +  1  product  gates,  such  that  C(s)  =  SWHE.Dec^*(c,  {/«,'},  {u1-}).  We  will 
thus  use  D  =  2 N2  and  B  =  2 N2  +  N  +  1  as  the  bounds  that  are  needed  for  Definition  2. 

Next  we  need  to  establish  that  one  can  choose  A  so  that,  for  any  sk  £  SWHE.KeyGen  and  any 
polynomial  Lj  £  Ca,  Lj(sk)  is  in  the  plaintext  space  of  our  multiplicatively  homomorphic  scheme. 
In  Lemma  5  below,  we  show  that  there  are  q  —  l  =  (p  —  3)/4  values  a  such  that  a,  a  +  1  £  QR (p). 
Since  N  is  polynomial  and  2 N2  + 1  <C  (7,  we  can  populate  A  with  N 2  + 1  such  values  efficiently.  The 
value  cij  +  Xj  for  a,j  £  A  and  secret  key  bit  X{  is  always  in  QR(p),  which  is  the  Elgamal  plaintext 
space. 

In  this  construction  we  trivially  get  the  property  that  the  MHE  scheme  (i.e.,  Elgamal)  can 
evaluate  the  D  multiplications  needed  by  the  circuits  C,  since  the  multiplicative  homomorphic 
capacity  of  Elgamal  is  infinite. 

It  remains  to  show  that  the  homomorphic  capacity  of  the  SWHE  scheme  is  sufficient  to  evaluate 
Elgamal  decryption  followed  by  one  operation  (i.e.,  the  last  bullet  in  Definition  1).  It  suffices  to 
show  that  Elgamal  decryption  can  be  computed  using  a  polynomial  of  degree  a  with  at  most  2“ 
monomials,  so  our  degree  parameter  a.  To  prepare  for  decryption,  we  post-process  each  Elgamal 
ciphertext  as  follows:  Given  a  ciphertext  (y  =  gr,z  =  m-g~er )  £  Z^,  we  compute  yt  =  y2'  —  1  mod  p 
for  i  =  0, 1, ... ,  [log  q]  —  1,  and  the  post-processed  ciphertext  is  (z,  yo,  ■  ■  ■ ,  yT- 1)  with  t  =  [log  q] . 
Given  an  Elgamal  secret  key  e  £  7Lq  with  binary  representation  eT_i . .  .eieo  (where  r  =  [logg], 
decryption  of  the  post-processed  ciphertext  becomes 

t— 1  r— 1 

MHE.Dec(e;  z,  y0, . . .  ,yr_i)  =  z  •  J \{ye'T)  =  z  ■  JJ(ej  •  yt  +  1)  (5) 

i= 0  2=0 

Being  overly  conservative  and  treating  z,yo, ,  yT-i  as  variables;  then  the  degree  of  the  polynomial 
above  is  2r  +  1,  and  it  has  2T  monomials.  Hence  the  degree  parameter  a  as  a  =  4  [log  +  2,  we 
get  a  scheme  whose  homomorphic  capacity  is  sufficient  for  Elgamal  decryption  followed  by  one 
operation. 

If  remains  to  see  that  this  choice  of  parameters  is  consistent.  Note  that  the  only  constraints 
that  we  use  in  this  proof  are  that  p  =  A^1-1  (so  that  DDH  is  hard),  (p  —  l)/2  =  q  >  2 N2  +  1  = 

poly(A,  log  q,  a)  (in  order  to  be  able  to  use  Theorem  1)  and  a  >  4  [log  <7]  +  2  (to  get  sufficient 

homomorphic  capacity).  Clearly,  if  p  is  exponential  in  A  (so  a  is  polynomial  in  A)  then  all  of  these 
constraints  are  satisfied.  □ 

Lemma  5.  Let  p  be  a  prime,  and  let  S  =  {(A,  Y)  :  X  =  Y  +  1;  X,  Y  £  QR(p)}.  Then,  |<S|  = 
(p  —  3)/4  if  p  =  3  mod  4,  and  |S|  =  (p  —  5)/4  if  p  =  1  mod  4. 

Proof.  Let  T  =  {(it,  v)  :  u  /  0,  v  7^  0,  u2  —  v2  =  1  rnodp}.  Since  X  and  Y  each  have  exactly 

two  nonzero  square  roots  if  they  are  quadratic  residues,  we  have  that  \T\  =  4  •  |5|.  It  remains  to 
establish  the  cardinality  of  T. 

For  each  pair  (u,  v)  £  T,  let  auv  =  u  +  v.  We  claim  that  distinct  pairs  in  T  cannot  have  the 
same  value  of  auv.  In  particular,  each  auv  completely  determines  both  u  and  v  as  follows.  We 
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have  u2  —  v2  =  1  — >  (u  —  v)(u  +  v)  =  1  — >  u  —  v  =  1  /auv,  and  therefore  u  =  ( auv  +  a~^)/ 2,  and 
v  =  ( auv  —  a~*)/2.  We  therefore  have  \U\  =  |T|,  where  U  =  {a  0  :  a  +  a-1  /  0,a  —  a-1  /  0}. 

We  have  that  a  £  U,  unless  a  =  0,  a2  =  —  1  mod  p,  or  a  =  ±1.  If  p  =  1  mod  4,  then  —1  £  QR (p), 
and  therefore  there  are  5  prohibited  values  of  a  -  i.e.,  \U\  =  p— 5.  Ifp  =  3  mod  4,  then  —1  ^  QR  (p), 
and  therefore  \U\  =  p  —  3.  □ 

A. 3  Leveled  FHE  Based  on  Worst-Case  Hardness 

We  next  describe  an  instantiation  where  both  the  SWHE  and  the  MHE  schemes  are  lattice-based 
encryption  schemes  with  security  based  (quantumly)  on  the  hardness  of  worst-case  problems  over 
ideal  lattices,  in  particular  ideal-SIVP.  This  scheme  could  be  Gentry’s  SWHE  scheme  [Gen09b, 
GenlO]  one  of  its  variants  [SS10,  SV10,  GH11],  or  one  of  the  more  recent  proposals  based  on  the 
ring-LWE  problem  [BV11,  Peill].  All  these  schemes  are  homomorphic  for  low-degree  polynomials 
and  have  threshold-type  decryption,  in  the  sense  of  Section  A.l. 

The  main  idea  of  this  construction  is  to  use  an  additively  homomorphic  encryption  (AHE) 
scheme  (e.g.,  one  using  lattices)  as  our  MHE  scheme,  by  working  with  discrete  logarithms.  For 
a  multiplicative  group  G  with  order  q  and  generator  g,  we  can  view  an  additively  homomorphic 
scheme  AHE  with  plaintext  space  7Lq  as  a  multiplicative  homomorphic  scheme  MHE  with  plaintext 
space  G :  In  the  MHE  scheme,  a  ciphertext  c  is  decrypted  as  MHE.Decrypt(c)  r/AHE  Decrypt('P . 
The  additive  homomorphism  mod  q  thus  becomes  a  multiplicative  homomorphism  in  G.  We  can 
therefore  use  MHE  as  a  component  in  chimeric  leveled  FHE,  assuming  it  is  compatible  with  a 
suitable  SWHE  scheme.  One  caveat  is  that  MHE’s  Encrypt  algorithm  is  not  obvious.  Presumably, 
to  encrypt  an  element  x  £  G,  we  encrypt  its  discrete  log  using  AHE’s  Encrypt  algorithm,  but  this 
requires  computing  discrete  logs  in  G.  Fortunately,  in  our  instantiation  we  can  choose  a  group  G 
of  polynomial  size,  so  computing  discrete  log  in  G  can  be  done  efficiently. 

The  main  difficulty  is  to  set  the  parameters  so  that  the  component  schemes  each  have  enough 
homomorphic  capacity  to  do  their  jobs. 

This  sort  of  compatibility  was  easy  for  the  Elgamal-based  instantiation,  since  the  parameters 
of  the  Elgamal  scheme  do  not  grow  with  the  multiplicative  homomorphic  capacity  required  of  the 
Elgamal  scheme;  Elgamal’s  multiplicative  homomorphic  capacity  is  infinite,  regardless  of  parame¬ 
ters.  On  the  other  hand,  the  additive  homomorphic  capacity  of  a  lattice-based  scheme  is  limited,  as 
system  parameters  must  grow  (albeit  slowly)  to  allow  more  additions.  What  makes  it  possible  to 
set  the  parameters  is  the  fact  that  such  schemes  can  handle  a  super-polynomial  number  of  additions. 

Below  let  us  fix  some  SWHE  construction  which  is  homomorphic  for  low-degree  polynomials  and 
has  threshold-type  decryption  (e.g.,  Gentry’s  scheme  [Gen09b,  GenlO]).  For  our  construction  we 
will  use  a  polynomial-size  plaintext  space,  namely  Zp  for  some  p  =  poly  (A).  In  more  detail,  we  will 
use  two  instances  of  the  same  scheme,  a  “large  instance”,  denoted  Lrg,  as  the  SWHE  of  our  Chimeric 
construction  and  a  “small  instance”,  denoted  Sml  for  the  MHE  of  our  Chimeric  construction.  The 
plaintext  space  for  Lrg  is  set  as  Zp  for  a  small  prime  p  =  poly(A),  and  the  plaintext  space  for  Sml 
is  set  as  Zg  for  q  =  p  —  1 . 

We  will  use  the  small  instance  as  a  multiplicative  homomorphic  encryption  scheme  with  plaintext 
space  Z*.  Below  let  g  be  a  generator  of  Z*.  Encryption  of  a  plaintext  x  £  Z*  under  this  MHE 
scheme  consists  of  first  computing  the  discrete  logarithm  of  x  to  the  base  g,  i.e.,  e  £  7Lq  such 
that  ge  =  x  (modp),  then  encrypting  e  under  Sml.  Similarly,  MHE  decryption  of  a  ciphertext 
c  consists  of  using  the  “native  decryption”  of  Sml  to  recover  the  “native  plaintext”  e  £  Z9,  then 
exponentiating  x  =  ge  mod  p. 
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The  homomorphic  capacity  of  Lrg  must  be  large  enough  to  evaluate  the  decryption  of  Sml 
followed  by  exponentiation  mod  p  and  then  a  quadratic  polynomial.  The  parameters  of  Sml  can  be 
chosen  much  smaller,  since  it  only  needs  to  support  addition  of  polynomially  many  terms  and  not 
even  a  single  multiplication.6 


A. 3.1  Decryption  under  Sml 

The  small  instance  has  n  bits  of  secret  key,  where  n  is  some  parameter  to  be  determined  later 
(selected  to  support  large  enough  homomorphic  capacity  to  evaluate  linear  polynomials  with  poly¬ 
nomially  many  terms.)  Since  native  decryption  of  Sml  is  of  the  form  of  Equation  (4),  decryption 
under  the  MHE  scheme  has  the  following  formula 


MHE.Decsfc(c) 


9  '9 


•  9 


mod  p 


(6) 


where  (c,  {u'i9u'(})  is  the  post-processed  ciphertext  (with  u[  £  Zq  and  u”  £  Z2",  and  k  =  |"log(n  +  1)] ). 
Below  we  show  how  this  formula  can  be  evaluated  as  a  rather  low-degree  arithmetic  circuit. 


The  complicated  part.  To  evaluate  the  “complicated  part”,  \2~K  u'lsi\  >  as  an  arithmetic 
circuit  mod  p  (with  input  the  bits  Sj),  we  will  construct  a  mod-p  circuit  that  outputs  the  binary 
representation  of  the  sum.  We  have  n  binary  numbers,  each  with  k  bits,  and  we  need  to  add 
them  over  the  integers  and  then  ignore  the  lower  k  bits.  Certainly,  each  bit  of  the  result  can  be 
expressed  mod-p  as  a  multilinear  polynomial  of  degree  only  n  ■  k  over  the  n  ■  k  bits  of  the  addends. 
It  is  challenging,  however,  to  show  that  these  low-degree  representations  can  actually  be  computed 
efficiently. 

In  any  case,  we  can  compute  the  sum  using  polynomials  of  degree  n  ■  kc  for  small  c,  easily  as 
follows:  Consider  a  single  column  x  £  {0,  l}n  of  the  sum.  Each  bit  in  the  binary  representation  of 
the  Hamming  weight  of  x  can  be  expressed  as  a  mod-p  multilinear  symmetric  polynomial  of  degree 
n  over  x.  After  using  degree  n  to  obtain  the  binary  representation  of  the  Hamming  weight  of 
each  column,  it  only  remains  to  add  the  k  K-bit  Hamming  weights  together  (each  Hamming  weight 
shifted  appropriately  depending  on  the  significance  of  its  associated  column)  using  degree  only  kc. 
Adding  numbers  k  K-bit  numbers  is  in  NC1,  and  in  particular  can  be  accomplished  with  low  degree 
using  the  “3-for-2”  trick  (see  [KR]),  repeatedly  replacing  each  three  addends  by  two  addends  that 
correspond  to  the  XOR  and  CARRY  (and  hence  have  the  same  sum),  each  replacement  only  costing 
constant  degree,  and  finally  summing  the  final  two  addends  directly.  Over  Zp,  the  3-for-2  trick  is 
done  using  the  formulas 

XOR(x,y,z)  =  Axyz  —  2(xy  +  xz  +  yz)  +  x  +  y  +  z 
CARRY (x,  y,  z)  =  xy  +  xz  +  yz  —  2 xyz 


The  simple  part  and  exponentiation.  Although  it  is  possible  to  compute  the  simple  part 
similarly  to  the  complicated  part,  it  is  easier  to  just  push  this  computation  into  the  exponentiation 
step.  Specifically,  we  now  have  a  K-bit  number  vq  that  we  obtained  as  the  result  of  the  “complicated 
part”,  and  we  also  have  the  [log q] -bit  numbers  v%  =  u[si  for  i  =  1  ,...,n  (all  represented  in 

6The  “small”  scheme  could  also  be  instantiated  from  other  additively  homomorphic  lattice-based  schemes,  e.g., 
one  of  Rcgev’s  schemes  [Reg04,  Reg09],  or  the  GPV  scheme  [GPV08],  etc. 
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binary),  and  we  want  to  compute  gc  ■  g^i= °Vi  •  modp.  Denote  the  binary  representation  of  each  Vi 
by  (vu  ■  ■  ■  vuvio),  namely  Vi  =  Ylj  vij^3  ■  Then  we  compute 

gcHU=0V>)  =  .</+«>:, D'O2')  =  gC  .  J](/)^  =  <f  J]  fa  ■  /  +  (1  -  %)  ■  l) 

i,j  i,j 

k  n  ri°g  1~\ 

=  n  (i + voj  ■  ( g2j  - 1))  •  5c  •  n  n  (x + ^  ^ 

j=o  *=i  i=o 

S - V - '  S - v - ' 

“complicated  part"  “simple  part" 

The  terms  gc  and  ( g2J  —  1)  are  known  constants  in  Zp,  hence  we  have  a  representation  of  the 
decryption  formula  as  an  arithmetic  circuit  mod  p. 

To  bound  the  degree  of  the  complicated  part,  notice  that  vo  has  k  bits,  each  a  polynomial  of 
degree  at  most  n  ■  poly(ft),  hence  the  entire  term  has  degree  bounded  by  n  ■  poly(fv).  For  the  simple 
part,  all  the  s  together  have  n  [log q]  bits  (each  is  just  a  variable),  so  the  degree  of  that  term  is 
bounded  by  just  n  [ log q\ .  Hence  the  total  degree  of  the  decryption  formula  is  0(n),  assuming  q  is 
quasi-polynomial  in  n.  One  can  also  verify  that  the  number  of  terms  is  2°(n\  (Known  lattice-based 
SWHE  schemes  have  n  =  O(A),  in  which  case  SmPs  decryption  has  degree  0(A).) 

A. 3. 2  The  SWHE  scheme  Lrg. 

The  large  instance  has  N  bits  of  secret  key,  where  N  is  some  parameter  to  be  determined  later, 
selected  to  support  large  enough  homomorphic  capacity  to  be  compatible  with  Sml.  As  explained 
in  Section  2,  the  decryption  of  Lrg  can  be  expressed  as  a  restricted  depth-3  circuit  of  degree  at 
most  2 N2  and  with  at  most  2 N2  +  N  +  1  product  gates.  Note  that  the  number  of  summands  in 
the  top  addition  is  at  most  2 N2  +  N  +  1  <  3 N2 . 

A. 3. 3  Setting  the  parameters. 

Lemma  6.  Let  Lrg  and  Sml  be  as  above.  We  can  choose  the  parameters  of  Lrg  and  Sml  so  that  Lrg 
is  chimerically  compatible  with  the  MHE  derived  from  Sml. 

Proof.  Denoting  the  security  parameter  by  A,  below  we  choose  the  plaintext  spaces  and  parameters 
a,  (5,  where  Lrg  can  support  polynomials  of  degree  up  to  a  with  2“  terms,  Sml  can  support  linear 
polynomials  with  up  to  2^  terms,  so  as  to  get  chimerically  compatible  schemes.  Note  that  making 
the  plaintext  spaces  of  the  two  schemes  compatible  is  simple,  all  we  need  to  do  is  choose  a  prime  p 
and  set  q  =  p  —  1,  and  let  the  plaintext  spaces  of  Lrg,  Sml  be  Zp  and  Z9,  respectively.  In  terms  of 
size  constraints  on  the  parameters,  we  have  the  following: 

•  p  >  2 A2,  so  that  we  can  use  Theorem  1. 

•  p  =  poly(A),  so  that  we  can  compute  discrete  logs  modulo  p  efficiently. 

•  /3  >  log(2 N'2)  =  2  log  N  +  1,  since  the  restricted  depth-3  circuits  for  the  decryption  of  Lrg  all 
have  degree  at  most  2 N2,  hence  we  need  an  MHE  scheme  that  supports  2 N2  products,  which 
means  that  Sml  should  support  linear  functions  with  2 N2  terms. 
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•  a  is  at  least  twice  the  degree  of  Sml’s  decryption,  so  that  we  can  compute  a  multiplication 
within  Lrg  after  evaluating  Sml’s  decryption  function. 

Up  front,  we  are  promised  polynomial  bounds  Ksm\i  Aj_rg  such  that  the  key-size  of  Sml  is  bounded 
by  n  <  iUsmi (A,  log  q,  (5)  and  the  key-size  of  Lrg  is  bounded  by  N  <  Aj_rg(A,  logp,  ct). 

Assuming  N  =  poly  (A)  (we  establish  this  later),  we  can  meet  the  first  three  constraints  by 
choosing  a  prime  p  £  [ 2N 2  +  1, 47V2]  (such  a  prime  must  exist  and  can  be  found  efficiently)  and 
/ 3  =  log p.  Then  A'sm|(A,  log  q,  f3)  =  o(ACSml+e)  for  any  e  >  0  and  some  constant  csm|.  We  argued  that 
before  that  when  Sml  has  n-bit  keys,  decryption  can  be  computed  with  degree  0(n-( logC2  n  +  logg)) 
for  some  constant  C2-  Therefore,  still  assuming  that  N  =  poly(A),  all  of  the  constraints  can  be 
satisfied  with  a  =  0( ACSml+e)  for  any  e  >  0.  But  then  of  course  N  can  be  poly(A)  since  it  is  bounded 
by  A\rg(A,logp,a).  □ 

Using  Gentry’s  scheme  and  proof  [Gen09b,  GenlO]  we  get: 

Corollary  1.  There  exists  a  leveled  FHE,  whose  security  is  reducible  via  quantum  reduction  to  the 
worst-case  hardness  of  S(I)  VP  in  ideal  lattices,  ideal-SIVP.  ■ 

B  Proof  of  Lemma  1 

Proof.  (Lemma  1)  Every  multilinear  symmetric  polynomial  M (x)  is  a  linear  combination  of  the 
elementary  symmetric  polynomials:  M(x)  =  Xa=o  Given  the  evaluation  M (x)  over  binary 

vectors  x  =  l*0n_*,  we  can  compute  the  if  s  as  follows.  We  obtain  the  constant  term  £q  ■  eo(x)  =  Iq 
by  evaluating  M  at  0”.  We  obtain  i^  recursively  via 

n  k— 1 

M(lk0n~k)  =  ei(lk0n~k)  =ik  +  J2£i'  ei(lfc °n~k) 

i= 0  i=0 

k— 1  k— 1 

=>  4  =  M  (lk0n~k)  -^ii-  ei(lkOn~k)  =  M(lk0n~k)  -J2£i- 

i= 0 

At  this  point,  it  suffices  to  prove  the  lemma  just  for  the  elementary  symmetric  polynomials.  This 
is  because  we  have  shown  that  we  can  efficiently  obtain  a  representation  of  M(x)  as  a  linear 
combination  of  the  elementary  symmetric  polynomials,  and  we  can  clearly  use  the  known  lj  values 
to  “merge”  together  the  depth-3  representations  of  the  elementary  symmetric  polynomials  that 
satisfy  the  constraints  of  Lemma  1  into  a  depth-3  representation  of  M  that  satisfies  the  constraints. 

For  each  i,  the  value  ej(x)  is  the  coefficient  of  zn~l  in  the  polynomial  P(z).  We  can  compute  the 
coefficients  of  P(z)  via  interpolation  from  the  values  P(a),  a  £  A.  Therefore,  each  value  ej(x)  can 
be  computed  by  a  £A-restricted  depth-3  arithmetic  circuit  as  follows:  using  n  +  1  product  gates, 
compute  the  values  P(a),  a  £  A,  and  then  (as  the  final  sum  gate),  interpolate  the  coefficient  of 
zn~l  from  the  P(a)  values.  □ 
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Abstract 

We  present  a  radically  new  approach  to  fully  homomorphic  encryption  (FHE)  that  dramatically  im¬ 
proves  performance  and  bases  security  on  weaker  assumptions.  A  central  conceptual  contribution  in  our 
work  is  a  new  way  of  constructing  leveled  fully  homomorphic  encryption  schemes  (capable  of  evaluating 
arbitrary  polynomial-size  circuits),  without  Gentry’s  bootstrapping  procedure. 

Specifically,  we  offer  a  choice  of  FHE  schemes  based  on  the  learning  with  error  (LWE)  or  ring-LWE 
(RLWE)  problems  that  have  2X  security  against  known  attacks.  For  RLWE,  we  have: 

•  A  leveled  FHE  scheme  that  can  evaluate  L-level  arithmetic  circuits  with  () ( A  ■  L3)  per-gate  com¬ 
putation  -  i.e.,  computation  quasi-linear  in  the  security  parameter.  Security  is  based  on  RLWE 
for  an  approximation  factor  exponential  in  L.  This  construction  does  not  use  the  bootstrapping 
procedure. 

•  A  leveled  FHE  scheme  that  uses  bootstrapping  as  an  optimization ,  where  the  per-gate  computation 
(which  includes  the  bootstrapping  procedure)  is  0( A2),  independent  of  L.  Security  is  based  on  the 
hardness  of  RLWE  for  quasi-polynomial  factors  (as  opposed  to  the  sub-exponential  factors  needed 
in  previous  schemes). 

We  obtain  similar  results  for  LWE,  but  with  worse  performance.  We  introduce  a  number  of  further 
optimizations  to  our  schemes.  As  an  example,  for  circuits  of  large  width  -  e.g.,  where  a  constant  fraction 
of  levels  have  width  at  least  A  -  we  can  reduce  the  per-gate  computation  of  the  bootstrapped  version  to 
0(A),  independent  of  L,  by  batching  the  bootstrapping  operation.  Previous  FHE  schemes  all  required 
H(A3  5)  computation  per  gate. 

At  the  core  of  our  construction  is  a  much  more  effective  approach  for  managing  the  noise  level  of 
lattice-based  ciphertexts  as  homomorphic  operations  are  performed,  using  some  new  techniques  recently 
introduced  by  Brakerski  and  Vaikuntanathan  (FOCS  201 1). 
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Uhis  material  is  based  on  research  sponsored  by  DARPA  under  Agreement  number  FA8750-1 1-2-0225.  All  disclaimers  as 
above  apply. 


2.  Fully  Homomorphic  Encryption  without  Bootstrapping 


1  Introduction 

Ancient  History.  Fully  homomorphic  encryption  (FHE)  [19,  8]  allows  a  worker  to  receive  encrypted  data 
and  perform  arbitrarily-complex  dynamically-chosen  computations  on  that  data  while  it  remains  encrypted, 
despite  not  having  the  secret  decryption  key.  Until  recently,  all  FHE  schemes  [8,  6,  20,  10,  5,  4]  followed 
the  same  blueprint,  namely  the  one  laid  out  in  Gentry's  original  construction  [8,  7], 

The  first  step  in  Gentry's  blueprint  is  to  construct  a  somewhat  homomorphic  encryption  (SWHE)  scheme, 
namely  an  encryption  scheme  capable  of  evaluating  “low-degree”  polynomials  homomorphically.  Starting 
with  Gentry's  original  construction  based  on  ideal  lattices  [8],  there  are  by  now  a  number  of  such  schemes 
in  the  literature  [6,  20,  10,  5,  4,  13],  all  of  which  arc  based  on  lattices  (either  directly  or  implicitly).  The 
ciphertexts  in  all  these  schemes  arc  “noisy”,  with  a  noise  that  grows  slightly  during  homomorphic  addition, 
and  explosively  during  homomorphic  multiplication,  and  hence,  the  limitation  of  low-degree  polynomials. 

To  obtain  FHE,  Gentry  provided  a  remarkable  bootstrapping  theorem  which  states  that  given  a  SWHE 
scheme  that  can  evaluate  its  own  decryption  function  (plus  an  additional  operation),  one  can  transform  it 
into  a  “leveled”1  FHE  scheme.  Bootstrapping  “refreshes”  a  ciphertext  by  running  the  decryption  function 
on  it  homomorphically,  using  an  encrypted  secret  key  (given  in  the  public  key),  resulting  in  a  reduced  noise. 

As  if  by  a  strange  law  of  nature,  SWHE  schemes  tend  to  be  incapable  of  evaluating  their  own  decryption 
circuits  (plus  some)  without  significant  modifications.  (We  discuss  recent  exceptions  [9,  3]  below.)  Thus, 
the  final  step  is  to  squash  the  decryption  circuit  of  the  SWHE  scheme,  namely  transform  the  scheme  into  one 
with  the  same  homomorphic  capacity  but  a  decryption  circuit  that  is  simple  enough  to  allow  bootstrapping. 
Gentry  [8]  showed  how  to  do  this  by  adding  a  “hint”  -  namely,  a  large  set  with  a  secret  sparse  subset  that 
sums  to  the  original  secret  key  -  to  the  public  key  and  relying  on  a  “sparse  subset  sum”  assumption. 

1.1  Efficiency  of  Fully  Homomorphic  Encryption 

The  efficiency  of  fully  homomorphic  encryption  has  been  a  (perhaps,  the)  big  question  following  its  inven¬ 
tion.  In  this  paper,  we  are  concerned  with  the  per-gate  computation  overhead  of  the  FHE  scheme,  defined 
as  the  ratio  between  the  time  it  takes  to  compute  a  circuit  homomorphically  to  the  time  it  takes  to  compute 
it  in  the  clear.2  Unfortunately,  FHE  schemes  that  follow  Gentry’s  blueprint  (some  of  which  have  actually 
been  implemented  [10,  5])  have  fairly  poor  performance  -  their  per-gate  computation  overhead  is  p( A),  a 
large  polynomial  in  the  security  parameter.  In  fact,  we  would  like  to  argue  that  this  penalty  in  performance 
is  somewhat  inherent  for  schemes  that  follow  this  blueprint. 

First,  the  complexity  of  (known  approaches  to)  bootstrapping  is  inherently  at  least  the  complexity  of 
decryption  times  the  bit-length  of  the  individual  ciphertexts  that  are  used  to  encrypt  the  bits  of  the  secret 
key.  The  reason  is  that  bootstrapping  involves  evaluating  the  decryption  circuit  homomorphically  -  that  is, 
in  the  decryption  circuit,  each  secret-key  bit  is  replaced  by  a  (large)  ciphertext  that  encrypts  that  bit  -  and 
both  the  complexity  of  decryption  and  the  ciphertext  lengths  must  each  be  U(A). 

Second,  the  undesirable  properties  of  known  SWHE  schemes  conspire  to  ensure  that  the  real  cost  of 
bootstrapping  for  FHE  schemes  that  follow  this  blueprint  is  actually  much  worse  than  quadratic.  Known 
FHE  schemes  staid  with  a  SWHE  scheme  that  can  evaluate  polynomials  of  degree  D  (multiplicative  depth 
log  D)  securely  only  if  the  underlying  lattice  problem  is  hard  to  2D -approximate  in  2A  time.  For  this  to 
be  hai'd,  the  lattice  must  have  dimension  Fl(D  ■  A).3  Moreover,  the  coefficients  of  the  vectors  used  in  the 

'in  a  “leveled”  FHE  scheme,  the  size  of  the  public  key  is  linear  in  the  depth  of  the  circuits  that  the  scheme  can  evaluate.  One 
can  obtain  a  “pure”  FHE  scheme  (with  a  constant-size  public  key)  from  a  leveled  FHE  scheme  by  assuming  “circular  security”  - 
namely,  that  it  is  safe  to  encrypt  the  leveled  FHE  secret  key  under  its  own  public  key.  We  will  omit  the  term  “leveled”  in  this  work. 

2Other  measures  of  efficiency,  such  ciphertext/key  size  and  encryption/decryption  time,  are  also  important.  In  fact,  the  schemes 
we  present  in  this  paper  are  very  efficient  in  these  aspects  (as  are  the  schemes  in  [9,  3]). 

3This  is  because  we  have  lattice  algorithms  in  n  dimensions  that  compute  2rl2A-approximations  of  short  vectors  in  time  20<-A). 
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scheme  have  bit  length  Q(D)  to  allow  the  ciphertext  noise  room  to  expand  to  2D.  Therefore,  the  size  of 
“fresh”  ciphertexts  (e.g.,  those  that  encrypt  the  bits  of  the  secret  key)  is  Q(I)2  ■  A).  Since  the  SWHE  scheme 
must  be  “bootstrappable”  -  i.e.,  capable  of  evaluating  its  own  decryption  function  -  D  must  exceed  the 
degree  of  the  decryption  function.  Typically,  the  degree  of  the  decryption  function  is  fi(A).  Thus,  overall, 
“fresh”  ciphertexts  have  size  fi( A3).  So,  the  real  cost  of  bootstrapping  -  even  if  we  optimistically  assume 
that  the  “stale”  ciphertext  that  needs  to  be  refreshed  can  be  decrypted  in  only  ©(A)-time  -  is  H(A4). 

The  analysis  above  ignores  a  nice  optimization  by  Stehle  and  Steinfeld  [22],  which  so  far  has  not  been 
useful  in  practice,  that  uses  Chernoff  bounds  to  asymptotically  reduce  the  decryption  degree  down  to  0(V A). 
With  this  optimization,  the  per-gate  computation  of  FHE  schemes  that  follow  the  blueprint  is  0(A3).4 

Recent  Deviations  from  Gentry’s  Blueprint,  and  the  Hope  for  Better  Efficiency.  Recently,  Gentry  and 
Halevi  [9],  and  Brakerski  and  Vaikuntanathan  [3],  independently  found  very  different  ways  to  construct  FHE 
without  using  the  squashing  step,  and  thus  without  the  sparse  subset  sum  assumption.  These  schemes  are  the 
first  major  deviations  from  Gentry’s  blueprint  for  FHE.  Brakerski  and  Vaikuntanathan  [3]  manage  to  base 
security  entirely  on  LWE  (for  sub-exponential  approximation  factors),  avoiding  reliance  on  ideal  lattices. 

From  an  efficiency  perspective,  however,  these  results  arc  not  a  clear  win  over  previous  schemes.  Both  of 
the  schemes  still  rely  on  the  problematic  aspects  of  Gentry’s  blueprint  -  namely,  bootstrapping  and  an  SWHE 
scheme  with  the  undesirable  properties  discussed  above.  Thus,  their  per-gate  computation  is  still  H(A4)  (in 
fact,  that  is  an  optimistic  evaluation  of  their  performance).  Nevertheless,  the  techniques  introduced  in  these 
recent  constructions  arc  very  interesting  and  useful  to  us.  In  particular,  we  use  the  tools  and  techniques 
introduced  by  Brakerski  and  Vaikuntanathan  [3]  in  an  essential  way  to  achieve  remarkable  efficiency  gains. 

An  important,  somewhat  orthogonal  question  is  the  strength  of  assumptions  underlying  FHE  schemes. 
All  the  schemes  so  far  rely  on  the  hardness  of  short  vector  problems  on  lattices  with  a  subexponential 
approximation  factor.  Can  we  base  FHE  on  polynomial  hardness  assumptions? 

1.2  Our  Results  and  Techniques 

We  leverage  Brakerski  and  Vaikuntanathan’s  techniques  [3]  to  achieve  asymptotically  very  efficient  FHE 
schemes.  Also,  we  base  security  on  lattice  problems  with  quasi-polynomial  approximation  factors.  (Previ¬ 
ous  schemes  all  used  sub-exponential  factors.)  In  particular,  we  have  the  following  theorem  (informal): 

•  Assuming  Ring  LWE  for  an  approximation  factor  exponential  in  L,  we  have  a  leveled  FHE  scheme 
that  can  evaluate  /.-level  arithmetic  circuits  without  using  bootstrapping.  The  scheme  has  0(A  •  L3) 
per-gate  computation  (namely,  quasi-linear  in  the  security  parameter). 

•  Alternatively,  assuming  Ring  LWE  is  hard  for  quasi-polynomial  factors,  we  have  a  leveled  FHE 
scheme  that  uses  bootstrapping  as  an  optimization ,  where  the  per-gate  computation  (which  includes 
the  bootstrapping  procedure)  is  0( A2),  independent  of  L. 

We  can  alternatively  base  security  on  LWE,  albeit  with  worse  performance.  We  now  sketch  our  main  idea 
for  boosting  efficiency. 

In  the  BV  scheme  [3],  like  ours,  a  ciphertext  vector  c  £  Rn  (where  II  is  a  ring,  and  n  is  the  “dimension” 
of  the  vector)  that  encrypts  a  message  m  satisfies  the  decryption  formula  m  =  [[(c,  s>]9]  2,  where  s  £  Rn  is 
the  secret  key  vector,  q  is  an  odd  modulus,  and  [-]9  denotes  reduction  into  the  range  (—q/2,  q/2).  This  is  an 
abstract  scheme  that  can  be  instantiated  with  either  LWE  or  Ring  LWE  -  in  the  LWE  instantiation,  II  is  the 
ring  of  integers  mod  q  and  n  is  a  large  dimension,  whereas  in  the  Ring  LWE  instantiation,  R  is  the  ring  of 
polynomials  over  integers  mod  q  and  an  irreducible  f(x),  and  the  dimension  n  =  1 . 

4We  note  that  bootstrapping  lazily  -  i.e.,  applying  the  refresh  procedure  only  at  a  1/fe  fraction  of  the  circuit  levels  for  k  >  1  - 
cannot  reduce  the  per-gate  computation  further  by  more  than  a  logarithmic  factor  for  schemes  that  follow  this  blueprint,  since  these 
SWHE  schemes  can  evaluate  only  log  multiplicative  depth  before  it  becomes  absolutely  necessary  to  refresh  -  i.e.,  k  =  0(log  A). 
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We  will  call  [(c,  s)]9  the  noise  associated  to  ciphertext  c  under  key  s.  Decryption  succeeds  as  long  as 
the  magnitude  of  the  noise  stays  smaller  than  q/2.  Homomorphic  addition  and  multiplication  increase  the 
noise  in  the  ciphertext.  Addition  of  two  ciphertexts  with  noise  at  most  B  results  in  a  ciphertext  with  noise  at 
most  2 B,  whereas  multiplication  results  in  a  noise  as  large  as  B2.  5  We  will  describe  a  noise-management 
technique  that  keeps  the  noise  in  check  by  reducing  it  after  homomorphic  operations,  without  bootstrapping. 

The  key  technical  tool  we  use  for  noise  management  is  the  “modulus  switching”  technique  developed  by 
Brakerski  and  Vaikuntanathan  [3].  Jumping  ahead,  we  note  that  while  they  use  modulus  switching  in  “one 
shot”  to  obtain  a  small  ciphertext  (to  which  they  then  apply  Gentry’s  bootstrapping  procedure),  we  will  use 
it  (iteratively,  gradually)  to  keep  the  noise  level  essentially  constant,  while  stingily  sacrificing  modulus  size 
and  gradually  sacrificing  the  remaining  homomorphic  capacity  of  the  scheme. 

Modulus  Switching.  The  essence  of  the  modulus-switching  technique  is  captured  in  the  following  lemma. 
In  words,  the  lemma  says  that  an  evaluator,  who  does  not  know  the  secret  key  s  but  instead  only  knows  a 
bound  on  its  length,  can  transform  a  ciphertext  c  modulo  q  into  a  different  ciphertext  modulo  p  while 
preserving  correctness  -  namely,  [(c',s)]p  =  [(c,  s)]q  mod  2.  The  transformation  from  c  to  c'  involves 
simply  scaling  by  ( p/q )  and  rounding  appropriately!  Most  interestingly,  if  s  is  short  and  p  is  sufficiently 
smaller  than  q,  the  “noise”  in  the  ciphertext  actually  decreases  -  namely,  |  [(o'.  s)]p|  <  |  [(c,  s)]g|. 

Lemma  1.  Let  p  and  q  be  two  odd  moduli,  and  let  c  be  an  integer  vector.  Define  c'  to  be  the  integer  vector 
closest  to  (p/q)  •  c  such  that  c!  =  c  mod  2.  Then,  for  any  s  with  |[(c,  s)]g|  <  q/2  —  ( q/p )  •  ^i(s),  we  have 

[{c',s)]p=[(c,s>],mod2  and  |[(c',  s)]p|  <  (p/q)  •  |[(c,  s)],|  +  ^i(s) 
where  G(s)  is  the  li-norm  of  s. 

Proof  For  some  integer  k,  we  have  [(c,  s)],j  =  (c,  s)  —  kq.  For  the  same  k,  let  ep  =  (c',  s)  —  kp  <£  Z.  Since 
c'  =  c  and  p  =  q  modulo  2,  we  have  ep  =  [(c,  s}](/  mod  2.  Therefore,  to  prove  the  lemma,  it  suffices  to 
prove  that  ep  =  [(c',  s)]p  and  that  it  has  small  enough  norm.  We  have  ep  =  (p/q)[(c,  s)]g  +  (c'  —  (p/q) c,s), 
and  therefore  |ep|  <  (p/q)[(c,  s)]9  +  G(s)  <  p/2.  The  latter  inequality  implies  ep  =  [(c',  s)]p.  □ 

Amazingly,  this  trick  permits  the  evaluator  to  reduce  the  magnitude  of  the  noise  without  knowing  the 
secret  key,  and  without  bootstrapping.  In  other  words,  modulus  switching  gives  us  a  very  powerful  and 
lightweight  way  to  manage  the  noise  in  FHE  schemes!  In  [3],  the  modulus  switching  technique  is  bundled 
into  a  “dimension  reduction”  procedure,  and  we  believe  it  deserves  a  separate  name  and  close  scrutiny.  It  is 
also  worth  noting  that  our  use  of  modulus  switching  does  not  require  an  “evaluation  key”,  in  contrast  to  [3]. 

Our  New  Noise  Management  Technique.  At  first,  it  may  look  like  modulus  switching  is  not  a  very 
effective  noise  management  tool.  If  p  is  smaller  than  q.  then  of  course  modulus  switching  may  reduce 
the  magnitude  of  the  noise,  but  it  reduces  the  modulus  size  by  essentially  the  same  amount.  In  short,  the 
ratio  of  the  noise  to  the  “noise  ceiling”  (the  modulus  size)  does  not  decrease  at  all.  Isn’t  this  ratio  what 
dictates  the  remaining  homomorphic  capacity  of  the  scheme,  and  how  can  potentially  worsening  (certainly 
not  improving)  this  ratio  do  anything  useful? 

In  fact,  it’s  not  just  the  ratio  of  the  noise  to  the  “noise  ceiling”  that’s  important.  The  absolute  magnitude 
of  the  noise  is  also  important,  especially  in  multiplications.  Suppose  that  q  «  xk,  and  that  you  have  two 
mod-q  SWHE  ciphertexts  with  noise  of  magnitude  x.  If  you  multiply  them,  the  noise  becomes  x2.  After 
4  levels  of  multiplication,  the  noise  is  x16.  If  you  do  another  multiplication  at  this  point,  you  reduce  the 
ratio  of  the  noise  ceiling  (i.e.  q)  to  the  noise  level  by  a  huge  factor  of  x16  -  i.e.,  you  reduce  this  gap  very 

3The  noise  after  multiplication  is  in  fact  a  bit  larger  than  B2  due  to  the  additional  noise  from  the  BV  “re-linearization”  process. 
For  the  purposes  of  this  exposition,  it  is  best  to  ignore  this  minor  detail. 
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fast.  Thus,  the  actual  magnitude  of  the  noise  impacts  how  fast  this  gap  is  reduced.  After  only  log  k  levels  of 
multiplication,  the  noise  level  reaches  the  ceiling. 

Now,  consider  the  following  alternative  approach.  Choose  a  ladder  of  gradually  decreasing  moduli 
{qi  ~  q/x1}  for  i  <  k.  After  you  multiply  the  two  mod-q  ciphertexts,  switch  the  ciphertext  to  the  smaller 
modulus  q\  =  q/x.  As  the  lemma  above  shows,  the  noise  level  of  the  new  ciphertext  (now  with  respect  to 
the  modulus  <71)  goes  from  x2  back  down  to  x.  (Let’s  suppose  for  now  that  i\  (s)  is  small  in  comparison  to  x 
so  that  we  can  ignore  it.)  Now,  when  we  multiply  two  ciphertexts  (wrt  modulus  q\)  that  have  noise  level  x, 
the  noise  again  becomes  x2  ,  but  then  we  switch  to  modulus  72  to  reduce  the  noise  back  to  x.  In  short,  each 
level  of  multiplication  only  reduces  the  ratio  (noise  ceiling)/(noise  level)  by  a  factor  of  x  (not  something  like 
,T16).  With  this  new  approach,  we  can  perform  about  k  (not  just  log  A:)  levels  of  multiplication  before  we 
reach  the  noise  ceiling.  We  have  just  increased  (without  bootstrapping)  the  number  of  multiplicative  levels 
that  we  can  evaluate  by  an  exponential  factor! 

This  exponential  improvement  is  enough  to  achieve  leveled  FHE  without  bootstrapping.  For  any  poly¬ 
nomial  k,  we  can  evaluate  circuits  of  depth  k.  The  performance  of  the  scheme  degrades  with  k  -  e.g.,  we 
need  to  set  q  =  go  to  have  bit  length  proportional  to  k  -  but  it  degrades  only  polynomially  with  k. 

Our  main  observation  -  the  key  to  obtaining  FHE  without  bootstrapping  -  is  so  simple  that  it  is  easy 
to  miss  and  hears  repeating:  We  get  noise  reduction  automatically  via  modulus  switching,  and  by  carefully 
calibrating  our  ladder  of  moduli  {qi},  one  modulus  for  each  circuit  level,  to  be  decreasing  gradually,  we 
can  keep  the  noise  level  very  small  and  essentially  constant  from  one  level  to  the  next  while  only  gradually 
sacrificing  the  size  of  our  modulus  until  the  ladder  is  used  up.  With  this  approach,  we  can  efficiently  evaluate 
arbitrary  polynomial-size  arithmetic  circuits  without  resorting  to  bootstrapping. 

Performance-wise,  this  scheme  trounces  previous  (bootstrapping-based)  FHE  schemes  (at  least  asymp¬ 
totically;  the  concrete  performance  remains  to  be  seen).  Instantiated  with  ring-LWE,  it  can  evaluate  /.-level 
arithmetic  circuits  with  per-gate  computation  ()(\  ■  L3)  -  i.e.,  computation  quasi-linear  in  the  security  pa¬ 
rameter.  Since  the  ratio  of  the  largest  modulus  (namely,  q  ~  xL)  to  the  noise  (namely,  x)  is  exponential  in 
L,  the  scheme  relies  on  the  hardness  of  approximating  short  vectors  to  within  an  exponential  in  L  factor. 

Bootstrapping  for  Better  Efficiency  and  Better  Assumptions,  The  per-gate  computation  of  our  FHE- 
without-bootstrapping  scheme  depends  polynomially  on  the  number  of  levels  in  the  circuit  that  is  being 
evaluated.  While  this  approach  is  efficient  (in  the  sense  of  “polynomial  time”)  for  polynomial-size  circuits, 
the  per-gate  computation  may  become  undesirably  high  for  very  deep  circuits.  So,  we  re-introduce  boot¬ 
strapping  as  an  optimization 6  that  makes  the  per-gate  computation  independent  of  the  circuit  depth,  and  that 
(if  one  is  willing  to  assume  circular  security)  allows  homomorphic  operations  to  be  performed  indefinitely 
without  needing  to  specify  in  advance  a  bound  on  the  number  of  circuit  levels.  The  main  idea  is  that  to 
compute  arbitrary  polynomial-depth  circuits,  it  is  enough  to  compute  the  decryption  circuit  of  the  scheme 
homomorphic  ally.  Since  the  decryption  circuit  has  depth  ~  log  A,  the  largest  modulus  we  need  has  only 
0( A)  bits,  and  therefore  we  can  base  security  on  the  hardness  of  lattice  problems  with  quasi-polynomial 
factors.  Since  the  decryption  circuit  has  size  0( A)  for  the  RLWE-based  instantiation,  the  per-gate  computa¬ 
tion  becomes  0( A2)  (independent  of  L).  See  Section  5  for  details. 

Other  Optimizations.  We  also  consider  batching  as  an  optimization.  The  idea  behind  batching  is  to  pack 
multiple  plaintexts  into  each  ciphertext  so  that  a  function  can  be  homomorphic  ally  evaluated  on  multiple 
inputs  with  approximately  the  same  efficiency  as  homomorphically  evaluating  it  on  one  input. 

6We  are  aware  of  the  seeming  irony  of  trumpeting  “FHE  without  bootstrapping”  and  then  proposing  bootstrapping  “as  an  opti¬ 
mization”.  First,  FHE  without  bootstrapping  is  exciting  theoretically,  independent  of  performance.  Second,  whether  bootstrapping 
actually  improves  performance  depends  crucially  on  the  number  of  levels  in  the  circuit  one  is  evaluating.  For  example,  for  circuits 
of  depth  sub-polynomial  in  the  security  parameter,  this  “optimization”  will  not  improve  performance  asymptotically. 
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An  especially  interesting  case  is  batching  the  decryption  function  so  that  multiple  ciphertexts  -  e.g.,  all 
of  the  ciphertexts  associated  to  gates  at  some  level  in  the  circuit  -  can  be  bootstrapped  simultaneously  very 
efficiently.  For  circuits  of  large  width  (say,  width  A),  batched  bootstrapping  reduces  the  per-gate  computation 
in  the  RLWE-based  instantiation  to  0( A),  independent  of  L.  We  give  the  details  in  Section  5. 

1.3  Other  Related  Work 

We  note  that  prior  to  Gentry’s  construction,  there  were  already  a  few  interesting  homomorphic  encryp¬ 
tions  schemes  that  could  be  called  “somewhat  homomorphic”,  including  Boneh-Goh-Nissim  [2]  (evaluates 
quadratic  formulas  using  bilinear  maps),  (Aguilar  Melchor)-Gaborit-Flerranz  [15]  (evaluates  constant  degree 
polynomials  using  lattices)  and  Ishai-Paskin  [12]  (evaluates  branching  programs). 

2  Preliminaries 

Basic  Notation.  In  our  construction,  we  will  use  a  ring  R.  In  our  concrete  instantiations,  we  prefer  to  use 
either  R  =  Z  (the  integers)  or  the  polynomial  ring  R  =  Z[x]/ (xd  +  1),  where  d  is  a  power  of  2. 

We  write  elements  of  R  in  lowercase  -  e.g.,  r  £  R.  We  write  vectors  in  bold  -  e.g.,  v  <S  Rn.  The  notation 
v[i]  refers  to  the  i-th  coefficient  of  v.  We  write  the  dot  product  of  u,  v  £  R"  as  (u,  v)  =  Ya= t u[*]  '  v [/]  £ 
R.  When  R  is  a  polynomial  ring,  ||r||  for  r  £  R  refers  to  the  Euclidean  norm  of  r’s  coefficient  vector.  We 
say  7 r  =  max{||a  •  6||/||a||  ||b||  :  a,  b  £  R}  is  the  expansion  factor  of  R.  For  R  =  Z [x\/(xd  +  1),  the  value 
of  7k  is  at  most  \fd  by  Cauchy-Schwarz. 

For  integer  q,  we  use  Rq  to  denote  R/qR.  Sometimes  we  will  use  abuse  notation  and  use  Il>  to  denote 
the  set  of  i?-elements  with  binary  coefficients  -  e.g.,  when  R  =  Z,  i?2  may  denote  {0, 1},  and  when  R  is  a 
polynomial  ring,  IR  may  denote  those  polynomials  that  have  0/1  coefficients.  When  it  is  obvious  that  q  is 
not  a  power  of  two,  we  will  use  [log  q]  to  denote  1  +  [log  q\ .  For  a  £  R,  we  use  the  notation  \a]q  to  refer 
to  a  mod  q,  with  coefficients  reduced  into  the  range  {—q/ 2,  q/ 2], 

Leveled  Fully  Homomorphic  Encryption.  Most  of  this  paper  will  focus  on  the  construction  of  a  leveled 
fully  homomorphic  scheme,  in  the  sense  that  the  parameters  of  the  scheme  depend  (polynomially)  on  the 
depth  of  the  circuits  that  the  scheme  is  capable  of  evaluating. 

Definition  1  (Leveled  Fully  Homomorphic  Encryption  [7]).  We  say  that  a  family  of  homomorphic  encryption 
schemes  :  L  G  Z+}  is  leveled  fully  homomorphic  if  for  all  L  £  Z+,  they  all  use  the  same  decryption 
circuit,  £(rd  compactly  evaluates  all  circuits  of  depth  at  most  L  ( that  use  some  specified  complete  set  of 
gates),  and  the  computational  complexity  of  £{  r/>  ’s  algorithms  is  polynomial  ( the  same  polynomial  for  all 
L)  in  the  security  parameter,  L,  and  (in  the  case  of  the  evaluation  algorithm)  the  size  of  the  circuit. 

2.1  The  Learning  with  Errors  (LWE)  Problem 

The  learning  with  errors  (LWE)  problem  was  introduced  by  Regev  [17].  It  is  defined  as  follows. 

Definition  2  (LWE).  For  security  parameter  X,  letn  =  n(X)  be  an  integer  dimension,  let  q  =  q{  A)  >  2  bean 
integer,  and  let  x  =  x(A)  be  a  distribution  overTL.  The  LWEnf/.x  problem  is  to  distinguish  the  following  two 
distributions:  In  the  first  distribution,  one  samples  (a,;,  bi)  uniformly  from  Z™+1.  In  the  second  distribution, 
one  first  draws  s  G-  Z”  uniformly  and  then  samples  ( a*,  6* )  £  Z^+1  by  sampling  a,  G-  Z™  uniformly, 
ei  G-  x>  and  setting  bi  =  (a,  s)  +  e*.  The  LWEr)  fy  x  assumption  is  that  the  LWE„)(?jX  problem  is  infeasible. 

Regev  [17]  proved  that  for  certain  moduli  q  and  Gaussian  error  distributions  y,  the  LWEn  g  x  assumption 
is  hue  as  long  as  certain  worst-case  lattice  problems  are  hard  to  solve  using  a  quantum  algorithm.  We  state 
this  result  using  the  terminology  of  /i- bounded  distributions,  which  is  a  distribution  over  the  integers  where 
the  magnitude  of  a  sample  is  bounded  with  high  probability.  A  definition  follows. 

5 


2.  Fully  Homomorphic  Encryption  without  Bootstrapping 


Definition  3  ( /i-hounded  distributions).  A  distribution  ensemble  {xn}neN>  supported  over  the  integers,  is 
called  B -bounded  if 

Pr  [|e|  >  B\  =  negl(n)  . 

e<-Xn 

We  can  now  state  Regev’s  worst-case  to  average-case  reduction  for  LWE. 

Theorem  1  (Regev  [17]).  For  any  integer  dimension  n,  prime  integer  q  =  q(n),  and  B  =  B(n)  >  2 n,  there 
is  an  efficiently  samplable  B-bounded  distribution  x  such  that  if  there  exists  an  efficient  (possibly  quan¬ 
tum)  algorithm  that  solves  LWEn;1J)X,  then  there  is  an  efficient  quantum  algorithm  for  solving  O^qn1'5 / B)- 
approximate  worst-case  SIVP  and  gapSVP. 

Peikert  [16]  de-quantized  Regev’s  results  to  some  extent  -  that  is,  he  showed  the  LWE„r/x  assumption 
is  true  as  long  as  certain  worst-case  lattice  problems  are  hard  to  solve  using  a  classical  algorithm.  (See  [16] 
for  a  precise  statement  of  these  results.) 

Applebaum  et  al.  [1]  showed  that  if  LWE  is  hard  for  the  above  distribution  of  s,  then  it  is  also  hard  when 
s’s  coefficients  are  sampled  according  to  the  noise  distribution  y. 

2.2  The  Ring  Learning  with  Errors  (RLWE)  Problem 

The  ring  learning  with  errors  (RLWE)  problem  was  introduced  by  Lyubaskevsky,  Peikert  and  Regev  [14]. 
We  will  use  an  simplified  special-case  version  of  the  problem  that  is  easier  to  work  with  [18,  4]. 

Definition  4  (RLWE).  For  security  parameter  A,  let  f(x)  =  xd  +  1  where  d  =  d{  A)  is  a  power  of  2.  Let 
q  =  q( A)  >2  be  an  integer.  Let  R  =  h[x\/(f(x))  and  let  Rq  =  R/qR.  Let  x  =  x(A)  be  a  distribution  over 
R.  The  RLWEd  g  x  problem  is  to  distinguish  the  following  two  distributions:  In  the  first  distribution,  one 
samples  (a*,  bf)  uniformly  from  Rq.  In  the  second  distribution,  one  first  draws  s  -t—  Rquniformly  and  then 
samples  ( a* ,  bf)  G  R2q  by  sampling  a{  uniformly,  e*  <—  x>  an d  setting  bi  =  a*  •  s  +  e*.  The  RLWE^giX 

assumption  is  that  the  RLWE,/,/x  problem  is  infeasible. 

The  RLWE  problem  is  useful,  because  the  well-established  shortest  vector  problem  (SVP)  over  ideal 
lattices  can  be  reduced  to  it,  specifically: 

Theorem  2  (Lyubashevsky-Peikert-Regev  [14]).  For  any  d  that  is  a  power  of  2,  ring  R  =  Z[x]/ [xd  +  1), 
prime  integer  q  =  q(d)  =  1  mod  d,  and  B  =  u(y/d  log  d),  there  is  an  efficiently  samplable  distribution  x 
that  outputs  elements  of  R  of  length  at  most  B  with  overwhelming  probability,  such  that  if  there  exists  an 
efficient  algorithm  that  solves  RLWE d,q,\,  then  there  is  an  efficient  quantum  algorithm  for  solving  dF^  ■ 
(q/ B) -approximate  worst-case  SVP  for  ideal  lattices  over  R. 

Typically,  to  use  RLWE  with  a  cryptosystem,  one  chooses  the  noise  distribution  x  according  to  a  Gaus¬ 
sian  distribution,  where  vectors  sampled  according  to  this  distribution  have  length  only  poly  (7/)  with  over¬ 
whelming  probability.  This  Gaussian  distribution  may  need  to  be  “ellipsoidal”  for  certain  reductions  to  go 
through  [14].  It  has  been  shown  for  RLWE  that  one  can  equivalently  assume  that  s  is  alternatively  sampled 
from  the  noise  distribution  y  [  14]. 

2.3  The  General  Learning  with  Errors  (GLWE)  Problem 

The  learning  with  errors  (LWE)  problem  and  the  ring  learning  with  errors  problem  RLWE  arc  syntactically 
identical,  aside  from  using  different  rings  (Z  versus  a  polynomial  ring)  and  different  vector  dimensions  over 
those  rings  (n  =  poly(A)  for  LWE,  but  n  is  constant  -  namely,  1  -  in  the  RLWE  case).  To  simplify  our 
presentation,  we  define  a  “General  Learning  with  Errors  (GLWE)”  Problem,  and  describe  a  single  “GLWE- 
based”  FHE  scheme,  rather  than  presenting  essentially  the  same  scheme  twice,  once  for  each  of  our  two 
concrete  instantiations. 


6 


2.  Fully  Homomorphic  Encryption  without  Bootstrapping 


Definition  5  (GLWE).  For  security  parameter  A,  let  n  =  n( A)  be  an  integer  dimension,  let  f(x)  =  xd  +  1 
where  d  =  d( A)  /A  «  power  of  2,  let  q  =  g(A)  >2  be  a  prime  integer,  let  R  =  2L\x\/ ( f(x ))  and  Rq  =  R/qR, 
and  let  x  =  xW  be  a  distribution  over  R.  The  GLWEn  j)(?  x  problem  is  to  distinguish  the  following  two 
distributions:  In  the  first  distribution,  one  samples  (a,,  b/)  uniformly  from  Rq+1.  In  the  second  distribution, 
one  first  draws  s  4—  Rq  uniformly  and  then  samples  (a ;,&*)  G  Rq+1  by  sampling  a,  G-  Rq  uniformly, 
e,  G-  x>  and  setting  hi  =  (a.;,s)  +  e*.  The  GLWE„ji(JvX  assumption  is  that  the  GLWEnj  9  X  problem  is 
infeasible. 

LWE  is  simply  GLWE  instantiated  with  d  =  1.  RLWE  is  GLWE  instantiated  with  n  =  1.  Interestingly,  as 
far  as  we  know,  instances  of  GLWE  between  these  extremes  have  not  been  explored.  One  would  suspect 
that  GLWE  is  hard  for  any  (n,  d)  such  that  n  •  d  =  Q(A  log (q/B)),  where  B  is  a  bound  (with  overwhelming 
probability)  on  the  length  of  elements  output  by  y.  For  fixed  n  ■  d,  perhaps  GLWE  gradually  becomes  harder 
as  n  increases  (if  it  is  true  that  general  lattice  problems  are  harder  than  ideal  lattice  problems),  whereas 
increasing  d  is  probably  often  preferable  for  efficiency. 

If  q  is  much  larger  than  If  the  associated  GLWE  problem  is  believed  to  be  easier  (i.e.,  there  is  less 
security).  Previous  FHE  schemes  required  q/B  to  be  sub-exponential  in  n  or  d  to  give  room  for  the  noise 
to  grow  as  homomorphic  operations  (especially  multiplication)  are  performed.  In  our  FHE  scheme  without 
bootstrapping,  q/B  will  be  exponential  in  the  number  of  circuit  levels  to  be  evaluated.  However,  since 
the  decryption  circuit  can  be  evaluated  in  logarithmic  depth,  the  bootstrapped  version  of  our  scheme  will 
only  need  q/B  to  be  quasi -polynomial,  and  we  thus  base  security  on  lattice  problems  for  quasi-polynomial 
approximation  factors. 

The  GLWE  assumption  implies  that  the  distribution  {(a*,  (a,  ,  s)  +  i  •  e, ) }  is  computational  indistinguish¬ 
able  from  uniform  for  any  t  relatively  prime  to  q.  This  fact  will  be  convenient  for  encryption,  where,  for 
example,  a  message  m  may  be  encrypted  as  (a,  (a,  s)  +  2e  +  ?/;),  and  this  fact  can  be  used  to  argue  that  the 
second  component  of  this  message  is  indistinguishable  from  random. 

3  (Leveled)  FHE  without  Bootstrapping:  Our  Construction 

The  plan  of  this  section  is  to  present  our  leveled  FHE-without-bootstrapping  construction  in  modular  steps. 
First,  we  describe  a  plain  GLWE-based  encryption  scheme  with  no  homomorphic  operations.  Next,  we 
describe  variants  of  the  “re linearization”  and  “dimension  reduction”  techniques  of  [3].  Finally,  in  Section 
3.4,  we  lay  out  our  construction  of  FHE  without  bootstrapping. 

3.1  Basic  Encryption  Scheme 

We  begin  by  presenting  a  basic  GLWE-based  encryption  scheme  with  no  homomorphic  operations.  Let  A  be 
the  security  parameter,  representing  2A  security  against  known  attacks.  (A  =  100  is  a  reasonable  value.) 

Let  R  =  /((A)  be  a  ring.  For  example,  one  may  use  R  =  Z  if  one  wants  a  scheme  based  on  (standard) 
LWE,  or  one  may  use  R  =  Z [x\/ f(x)  where  (e.g.)  f(x)  =  xd  +  1  and  d  =  d(  A)  is  a  power  of  2  if  one  wants 
a  scheme  based  on  RLWE.  Let  the  “dimension”  n  =  n{ A),  an  odd  modulus  q  =  q( A),  a  “noise”  distribution 
y  =  y(A)  over  R,  and  an  integer  N  =  N( A)  be  additional  parameters  of  the  system.  These  parameters 
come  from  the  GLWE  assumption,  except  for  N,  which  is  set  to  be  larger  than  (2 n  +  1)  log q.  Note  that 
n  =  1  in  the  RLWE  instantiation.  For  simplicity,  assume  for  now  that  the  plaintext  space  is  R-2  =  R/2R, 
though  larger  plaintext  spaces  are  certainly  possible. 

We  go  ahead  and  stipulate  here  -  even  though  it  only  becomes  important  when  we  introduce  homomor¬ 
phic  operations  -  that  the  noise  distribution  \  is  set  to  be  as  small  as  possible.  Specifically,  to  base  security 
on  LWE  or  GLWE,  one  must  use  (typically  Gaussian)  noise  distributions  with  deviation  at  least  some  sub- 
linear  function  of  d  or  n,  and  we  will  let  \  be  a  noise  distribution  that  barely  satisfies  that  requirement.  To 
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achieve  2A  security  against  known  lattice  attacks,  one  must  have  n-d  =  Q(A  •  log (q/B))  where  B  is  a  bound 
on  the  length  of  the  noise.  Since  n  or  d  depends  logarithmically  on  q,  and  since  the  distribution  y  (and  hence 
B)  depends  suh-lincarly  on  n  or  d,  the  distribution  y  (and  hence  B)  depends  sub-logarithmically  on  q.  This 
dependence  is  weak,  and  one  should  think  of  the  noise  distribution  as  being  essentially  independent  of  q. 

Here  is  a  basic  GLWE-based  encryption  scheme  with  no  homomorphic  operations: 

Basic  GLWE-Based  Encryption  Scheme: 

•  E.Setup(lA,  1^,  b):  Use  the  bit  b  £  {0, 1}  to  determine  whether  we  are  setting  parameters  for  a  LWE- 
based  scheme  (where  d  =  1)  or  a  RLWE-based  scheme  (where  n  =  1).  Choose  a  //-bit  modulus  q  and 
choose  the  other  parameters  (d  =  d( A,  /j,  b),  n  =  n( A,  //,  b),  N  =  |~(2 n  +  1)  logq],  y  =  y(A,  /j,  b)) 
appropriately  to  ensure  that  the  scheme  is  based  on  a  GLWE  instance  that  achieves  2A  security  against 
known  attacks.  Let  R  =  Z[x]/ ( xd  +  1)  and  let  params  =  (q.  d,  n,  N,  y). 

•  E.SecretKeyGen (params):  Draw  s'  <—  yn.  Set  sk  =  s  £-  (l,s'[l], . . .  ,  s'[n])  £ 

•  E.PublicKeyGen(params,  sk):  Takes  as  input  a  secret  key  sk  =  s  =  (l,s')  with  s[0]  =  1  and 
s'  £  R™  and  the  params.  Generate  matrix  A'  •£-  R^xn  uniformly  and  a  vector  e  £-  xN  and  set 
b  £-  A' s'  +  2e.  Set  A  to  be  the  (n  +  l)-column  matrix  consisting  of  b  followed  by  the  n  columns  of 
—A'.  (Observe:  A  •  s  =  2e.)  Set  the  public  key  pk  =  A. 

•  E.Er\c(params,  pk,  m):  To  encrypt  a  message  rn  £  R-2,  set  m  -t—  (m,  0, . . . ,  0)  £  Rq+1,  sample 
r  £-  III)  and  output  the  ciphertext  c£m  +  A7  r  £  i?”  + 1 . 

•  E.  Dec  (params,  sk,  c):  Output  m  £-  [[(c,  s)]<?]2- 

Correctness  is  easy  to  see,  and  it  is  straightforward  to  base  security  on  special  cases  (depending  on  the 
parameters)  of  the  GLWE  assumption  (and  one  can  find  such  proofs  of  special  cases  in  prior  work). 

3.2  Key  Switching  (Dimension  Reduction) 

We  start  by  reminding  the  reader  that  in  the  basic  GLWE-based  encryption  scheme  above,  the  decryption 
equation  for  a  ciphertext  c  that  encrypts  m  under  key  s  can  be  written  as  m  =  [[Lc(s)]9]2  where  Lc(x)  is  a 
ciphertext-dependent  linear  equation  over  the  coefficients  of  x  given  by  Lc(x)  =  (c,  x). 

Suppose  now  that  we  have  two  ciphertexts  cy  and  C2,  encrypting  rn  \  and  m2  respectively  under  the 
same  secret  key  s.  The  way  homomorphic  multiplication  is  accomplished  in  [3]  is  to  consider  the  quadratic 
equation  QCl,C2(x)  LCl(x)  •  LC2(x).  Assuming  the  noises  of  the  initial  ciphertexts  are  small  enough,  we 
obtain  m\  ■  m2  =  [Qc1,c2(s)]g]2,  as  desired.  If  one  wishes,  one  can  view  <5ci,c2(x)  as  a  linear  equation 
L1qi,c2  (x<8>x)  over  the  coefficients  of  x<8>x  -  that  is,  the  tensoring  of  x  with  itself  -  where  x<8>x’s  dimension 
is  roughly  the  square  of  x’s.  Using  this  interpretation,  the  ciphertext  represented  by  the  coefficients  of  the 
lineal-  equation  Llon9  is  decryptable  by  the  long  secret  key  si  <8>  Si  via  the  usual  dot  product.  Of  course,  we 
cannot  continue  increasing  the  dimension  like  this  indefinitely  and  preserve  efficiency. 

Thus,  Brakerski  and  Vaikuntanathan  convert  the  long  ciphertext  represented  by  the  linear  equation  fjnn') 
and  decryptable  by  the  long  tensored  secret  key  si  <8>  si  into  a  shorter  ciphertext  C2  that  is  decryptable  by  a 
different  secret  key  S2.  (The  secret  keys  need  to  be  different  to  avoid  a  “circular  security”  issue).  Encryptions 
of  si  <gi  si  under  S2  are  provided  in  the  public  key  as  a  “hint”  to  facilitate  this  conversion. 

We  observe  that  Brakerski  and  Vaikuntanathan ’s  relinearization  /  dimension  reduction  procedures  are 
actually  quite  a  bit  more  general.  They  can  be  used  to  not  only  reduce  the  dimension  of  the  ciphertext,  but 
more  generally,  can  be  used  to  transform  a  ciphertext  ci  that  is  decryptable  under  one  secret  key  vector  si  to 
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a  different  ciphertext  C2  that  encrypts  the  same  message,  but  is  now  decryptable  under  a  second  secret  key 
vector  S2-  The  vectors  C2,  S2  may  not  necessarily  be  of  lower  degree  or  dimension  than  ci,  si. 

Below,  we  review  the  concrete  details  of  Brakerski  and  Vaikuntanathan’s  key  switching  procedures.  The 
procedures  will  use  some  subroutines  that,  given  two  vectors  c  and  s,  “expand”  these  vectors  to  get  longer 
(higher-dimensional)  vectors  c'  and  s'  such  that  (c',  s')  =  (c,  s)  mod  q.  We  describe  these  subroutines  first. 

•  BitDecomp(x  G  Rf .  q)  decomposes  x  into  its  bit  representation.  Namely,  write  x  =  E'=o^  2J  •  u; , 
where  all  of  the  vectors  Uj  arc  in  R'f  and  output  (uo,  Ui, . . . ,  unog(Jj )  G  /()  ^log^ . 

•  Powersof2(x  G  R™,  q)  outputs  the  vector  (x,  2  •  x, . . . ,  2 Llos 'Jj  .  x)  g  riog<?l  _ 

If  one  knows  a  priori  that  x  has  coefficients  in  [0,  B]  for  B  <C  q,  then  BitDecomp  can  be  optimized  in 
the  obvious  way  to  output  a  shorter  decomposition  in  R^  .  Observe  that: 

Lemma  2.  For  vectors  c,  s  of  equal  length,  we  have  (BitDecomp(c,  q ),  Powersof2(s,  q)}  =  (c,  s)  mod  q. 
Proof. 


|°g«J  LloggJ  /  Llog  <7j  \ 

(BitDecomp(c,  q),  Powersof2(s,  q))  =  E  <UJ-,2^'-s)=  E  (2J'-Ui,s)  =  (  E  2J  •  Uj;  s  \  =  (c,  s)  . 

3=0  j= 0  \  j= 0  / 


□ 


We  remark  that  this  obviously  generalizes  to  decompositions  wrt  bases  other  than  the  powers  of  2. 

Now,  key  switching  consists  of  two  procedures:  first,  a  procedure  SwitchKeyGen(si,  S2,  rai,  ri2,  q), 
which  takes  as  input  the  two  secret  key  vectors  as  input,  the  respective  dimensions  of  these  vectors,  and 
the  modulus  q,  and  outputs  some  auxiliary  information  tSi_s.S2  that  enables  the  switching;  and  second,  a 
procedure  Switch  Key  (rs ,  _^S2 .  c  ] ,  n  1 ,  rw .  q),  that  takes  this  auxiliary  information  and  a  ciphertext  encrypted 
under  si  and  outputs  a  new  ciphertext  C2  that  encrypts  the  same  message  under  the  secret  key  S2.  (Below, 
we  often  suppress  the  additional  arguments  n  1 .  ri‘> ,  q.) 

Switch  KeyGen(si  G  Bff  ,  S2  G  Rff ) : 

1.  Run  A  4-  E.PublicKeyGen(s2,  N)  for  N  =  n\  ■  [log  q ] . 

2.  Set  B  G-  A  +  Powersof2(si)  (Add  Powersof2(si)  G  R ^  to  A’s  first  column.)  Output  tSi^S2  =  B. 

Switch  Key  (tSi_5.S2,  ci):  Output  C2  =  BitDecomp(ci)7  •  B  G  i?”2. 

Note  that,  in  Switch  KeyGen,  the  matrix  A  basically  consists  of  encryptions  of  0  under  the  key  S2.  Then, 
pieces  of  the  key  si  are  added  to  these  encryptions  of  0.  Thus,  in  some  sense,  the  matrix  B  consists  of 
encryptions  of  pieces  of  si  (in  a  certain  format)  under  the  key  S2.  We  now  establish  that  the  key  switching 
procedures  arc  meaningful,  in  the  sense  that  they  preserve  the  correctness  of  decryption  under  the  new  key. 

Lemma  3.  [Correctness]  Let  si,  S2,  q,  n\,  n,2,  A,  B  =  tSi_s.S2  be  as  in  SwitchKeyGen(si,  S2),  and  let 
A  •  S2  =  2e2  G  Rq  .  Let  ci  G  i?”1  and  c-2  G-  Switch  Key  (rSl_;.S2,  ci).  Then, 

(c2,  s2)  =  2  (BitDecomp(ci),  e2)  +  (ci,  si)  mod  q 
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Proof. 


(c2)  S2)  =  BitDecomp(ci)3  •  B  •  S2 

=  BitDecomp(ci)r  •  ( 2e-2  +  Powersof2(si)) 

=  2  (BitDecomp(ci),  e2)  +  (BitDecomp(ci),  Powersof2(si)} 

=  2  (BitDecomp(ci),  e2)  +  (ci,  Si) 

□ 

Note  that  the  dot  product  of  BitDecomp(ci)  and  e2  is  small,  since  BitDecomp(ci)  is  in  R/ .  Overall,  we 
have  that  C2  is  a  valid  encryption  of  m  under  key  S2,  with  noise  that  is  larger  by  a  small  additive  factor. 

3.3  Modulus  Switching 

Suppose  c  is  a  valid  encryption  of  m  under  s  modulo  q  (i.e.,  m  =  [[(c,  s ) ] ] 2 ) •,  and  that  s  is  a  short  vector. 
Suppose  also  that  c7  is  basically  a  simple  scaling  of  c  -  in  particular,  c7  is  the  /(-vector  closest  to  (p/q)  ■  c 
such  that  c!  =  c  mod  2.  Then,  it  turns  out  (subject  to  some  qualifications)  that  c7  is  a  valid  encryption  of 
m  under  s  modulo  p  using  the  usual  decryption  equation  -  that  is,  rn  =  [[(c7.  s)]p]2 '  In  other  words,  we 
can  change  the  inner  modulus  in  the  decryption  equation  -  e.g.,  to  a  smaller  number  -  while  preserving  the 
correctness  of  decryption  under  the  same  secret  key!  The  essence  of  this  modulus  switching  idea,  a  valiant 
of  Brakerski  and  Vaikuntanathan’s  modulus  reduction  technique,  is  formally  captured  in  Lemma  4  below. 

Definition  6  (Scale).  For  integer  vector  x  and  integers  q  >  p  >  m,  we  define  x7  <—  Scale(x,  q.  p.  r)  to  be 
the  R-vector  closest  to  (p/q)  •  x  that  satisfies  x7  =  x  mod  r. 

Definition  7  (()  norm).  The  (usual)  norm  1 1  ( s)  over  the  reals  equals  ff,  ||s[i]||.  We  extend  this  to  our 
ring  R  as  follows:  *  (s)  for  s  G  Rn  is  defined  as  Yi  |  I s  [*]  |  j  • 

Lemma  4.  Let  d  be  the  degree  of  the  ring  (e.g.,  d  =  1  when  R  =  Z).  Let  q  >  p  >  r  be  positive 

integers  satisfying  q  =  p  =  1  mod  r.  Let  c  G  R"  and  c7  i—  Scale(c,  q,p,  r).  Then,  for  any  s  G  R"  with 

II  [(C)  s)]q  ||  <  q/2  -  ( q/p )  ■  (r/2)  •  s/d  ■  7 (R)  ■  i[R\s),  we  have 

[(c7,s)]p  =  [(c,s)]?  mod  r  and  ||  [(c7,  s)]p||  <  {p/q)  ■  ||[(c,s)]g||  +  (r/2)  ■  Vd  ■  y{R)  ■  i[R) { s) 

Proof.  (Lemma  4)  We  have 

[<c,s )],  =  (c,s)  -  kq 

for  some  k  G  R.  For  the  same  k,  let 

ep  =  (c7,  s)  —  kp  G  R 

Note  that  ep  =  [(c7,  s)]p  mod  p.  We  claim  that  ||ep||  is  so  small  that  ep  =  [(c7,  s)]p.  We  have: 

llepll  =  II  —  kp  +  {{p/q)  •  c,  s)  +  (c7  -  {p/q)  ■  c,  s)  || 

<  II  -  kp+({p/q)  -c,s)  ||  +  ||  (c7  -  {p/q)  ■  c,s)  || 

n 

<  {P/Q)  ■  IIKc>s>]gll  +7(^)  '  llc/^  “  {p/q)'cIM  ■  Hs[4lll 

3= 1 

<  {p/q)  ■  II Kc) s)]gll  +7 {R)  ■  (r/ 2)  •  Vd-t[R\s) 

<p/2 

Furthermore,  modulo  r,  we  have  [(c7,  s)]p  =  ep  =  (c7,  s)  —  kp  =  (c,  s)  —  kq  =  [(c,  s)]q.  □ 
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The  lemma  implies  that  an  evaluator,  who  does  not  know  the  secret  key  but  instead  only  knows  a  bound 
on  its  length,  can  potentially  transform  a  ciphertext  c  that  encrypts  m  under  key  s  for  modulus  q  -  i.e.,  rn  = 
[[(c,  s)]q]r  -  into  a  ciphertext  c  that  encrypts  m  under  the  same  key  s  for  modulus  p  -  i.e.,  m  =  [[(c,  s)]p]r. 
Specifically,  the  following  corollary  follows  immediately  from  Lemma  4. 

Corollary  1.  Let  p  and  q  be  two  odd  moduli.  Suppose  c  is  an  encryption  of  bit  m  under  key  s  for  modulus  q  - 

i.e.,  rn  =  [[(c,  s)]^],..  Moreover,  suppose  that  s  is  a  fairly  short  key  and  the  “noise”  eq  £-  [(c,  s)]q  has  small 

magnitude -precisely,  assume  that  \\eq\\  <  q/2—(q/p)-(r/2)-s/d-y(R)-i[R\s).  Then  c'  <—  Scale(c,  q,p,  r) 
is  an  encryption  of  of  bit  m  under  key  s  for  modulus  p  -  i.e.,  m  =  [[(c,  s)]p]r.  The  noise  ep  =  [(c',  s)]p  of 
the  new  ciphertext  lias  magnitude  at  most  (p/q)  •  ||[(c,s)]?||  +y(R)  ■  (r/2)  •  \/d  ■  i[R\s). 

Amazingly,  assuming  p  is  smaller  than  q  and  s  has  coefficients  that  arc  small  in  relation  to  q,  this  trick 
permits  the  evaluator  to  reduce  the  magnitude  of  the  noise  without  knowing  the  secret  key!  (Of  course,  this 
is  also  what  Gentry’s  bootstrapping  transformation  accomplishes,  but  in  a  much  more  complicated  way.) 

3.4  (Leveled)  FHE  Based  on  GLWE  without  Bootstrapping 

We  now  present  our  FHE  scheme.  Given  the  machinery  that  we  have  described  in  the  previous  subsections, 
the  scheme  itself  is  remarkably  simple. 

In  our  scheme,  we  will  use  a  parameter  L  indicating  the  number  of  levels  of  arithmetic  circuit  that  we 
want  our  FHE  scheme  to  be  capable  of  evaluating.  Note  that  this  is  an  exponential  improvement  over  prior 
schemes,  that  would  typically  use  a  parameter  d  indicating  the  degree  of  the  polynomials  to  be  evaluated. 

(Note:  the  linear  polynomial  Llon9 ,  used  below,  is  defined  in  Section  3.2.) 

Our  FHE  Scheme  without  Bootstrapping: 

•  FHE.Setup(lA,  1L,  b):  Takes  as  input  the  security  parameter,  a  number  of  levels  L,  and  a  bit  b.  Use 
the  bit  b  £  {0, 1}  to  determine  whether  we  are  setting  parameters  for  a  LWE-based  scheme  (where 
d  =  1)  or  a  RLWE-based  scheme  (where  n  =  1).  Let  p  =  /i(A,L,6)  =  #(logA  +  logL)  be  a 
parameter  that  we  will  specify  in  detail  later.  For  j  =  L  (input  level  of  circuit)  to  0  (output  level),  run 
paramSj  -t—  E.Setup(lA,  1^+1)'M,  b )  to  obtain  a  ladder  of  decreasing  moduli  from  qL  ( ( L  +  1)  •  p 
bits)  down  to  go  (F  bits).  For  j  =  L  —  1  to  0,  replace  the  value  of  dj  in  paramsj  with  d  =  dL  and  the 
distribution  Xj  with  X  =  XL-  (That  is,  the  ring  dimension  and  noise  distribution  do  not  depend  on  the 
circuit  level,  but  the  vector  dimension  rij  still  might.) 

•  FHE.KeyGendparajnsj}):  For  j  =  L  down  to  0,  do  the  following: 

1.  Run  Sj  <—  E.SecretKeyGen (param.Sj)  and  Aj  <—  E.PublicKeyGen(paramSj,  sj). 

2.  Set  s'  4—  Sj  <8)  Sj  £  Rqj  2  .  That  is,  s'  is  a  tensoring  of  s;  with  itself  whose  coefficients  arc 

each  the  product  of  two  coefficients  of  s j  in  Rqj . 

3.  Sets''  BitDecomp(s'-,  gj). 

4.  Run  tsh  _;.s.  <—  SwitchKeyGen(s",  Sj_i).  (Omit  this  step  when  j  =  L.) 

j  -\-l  J  J 

The  secret  key  sk  consists  of  the  sfs  and  the  public  key  pk  consists  of  the  Aj’s  and  r8«+i_>.Sj.  ’s. 

•  FHE.En c(params,pk,m):  Take  a  message  in  R-2-  Run  E.Enc(Ai,  m). 

•  FHE. Dec (params,  sk,  c):  Suppose  the  ciphertext  is  under  key  sy .  Run  E.Dec(s?,  c).  (The  ciphertext 
could  be  augmented  with  an  index  indicating  which  level  it  belongs  to.) 
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•  FHE.Add(p/c,  ci,  C2):  Takes  two  ciphertexts  encrypted  under  the  same  sy.  (If  they  are  not  initially, 
use  FHE. Refresh  (below)  to  make  it  so.)  Set  C3  4-  ci  +  C2  mod  qj.  Interpret  C3  as  a  ciphertext  under 
s'  (s' ’s  coefficients  include  all  of  s/s  since  s'  =  sy  <gi  s j  and  s/s  first  coefficient  is  1)  and  output: 

c4  <r-  FHE.Refresh(c3,rs//^s  gi,gi_i) 

3  J 

•  FHE.Mult(pfc,  ci,  C2):  Takes  two  ciphertexts  encrypted  under  the  same  s?.  If  they  arc  not  initially, 
use  FFH E. Refresh  (below)  to  make  it  so.)  First,  multiply:  the  new  ciphertext,  under  the  secret  key 
s'  =  s j  <8>  s j,  is  the  coefficient  vector  C3  of  the  linear  equation  (x  <g>  x).  Then,  output: 

c4  F-  FHE. Refresh (c3,rs//^.s  ,^,c?j_i) 

3  J 

•  FH E. Refresh (c,  rs//_:,s  , ,  qj,qj- 1):  Takes  a  ciphertext  encrypted  under  s'-,  the  auxiliary  information 

j  3  J 

rs//_,K  to  facilitate  key  switching,  and  the  current  and  next  moduli  qj  and  qj-i-  Do  the  following: 

1.  Expand:  Set  c4  Powersof2(c,  qj).  (Observe:  ^c4,s =  ^c,  s 0  mod  q-j  by  Lemma  2.) 

2.  Switch  Moduli:  Set  C2  <—  Scale(ci,  qj ,  qj-i,  2),  a  ciphertext  under  the  key  s'-  for  modulus  qj-i- 

3.  Switch  Keys:  Output  C3  Switch Key(rs//_>s ,  C2,  qj- 1),  a  ciphertext  under  the  key  s;_  1  for 

j  3 

modulus  qj-\. 


Remark  1.  We  mention  the  obvious  fact  that,  since  addition  increases  the  noise  much  more  slowly  than 
multiplication,  one  does  not  necessarily  need  to  refresh  after  additions,  even  high  fan-in  ones. 

The  key  step  of  our  new  FHE  scheme  is  the  Refresh  procedure.  If  the  modulus  is  chosen  to  be 
smaller  than  qj  by  a  sufficient  multiplicative  factor,  then  Corollary  1  implies  that  the  noise  of  the  ciphertext 
output  by  Refresh  is  smaller  than  that  of  the  input  ciphertext  -  that  is,  the  ciphertext  will  indeed  be  a 
“refreshed”  encryption  of  the  same  value.  We  elaborate  on  this  analysis  in  the  next  section. 

One  can  reasonably  argue  that  this  scheme  is  not  “FHE  without  bootstrapping”  since  rsr'_).s._1  can  be 
viewed  as  an  encrypted  secret  key,  and  the  Switch  Key  step  can  viewed  as  a  homomorphic  evaluation  of  the 
decryption  function.  We  prefer  not  to  view  the  Switch  Key  step  this  way.  While  there  is  some  high-level 
resemblance,  the  low-level  details  arc  very  different,  a  difference  that  becomes  tangible  in  the  much  better 
asymptotic  performance.  To  the  extent  that  it  performs  decryption.  Switch  Key  does  so  very  efficiently  using 
an  efficient  (not  bit-wise)  representation  of  the  secret  key  that  allows  this  step  to  be  computed  in  quasi-l  incar 
time  for  the  RLWE  instantiation,  below  the  quadratic  lower  bound  for  bootstrapping.  Certainly  Switch  Key 
does  not  use  the  usual  ponderous  approach  of  representing  the  decryption  function  as  a  boolean  circuit  to 
be  traversed  homomorphically.  Another  difference  is  that  the  Switch  Key  step  does  not  actually  reduce  the 
noise  level  (as  bootstrapping  does);  rather,  the  noise  is  reduced  by  the  Scale  step. 

4  Correctness,  Setting  the  Parameters,  Performance,  and  Security 

Here,  we  will  show  how  to  set  the  parameters  of  the  scheme  so  that  the  scheme  is  correct.  Mostly,  this 
involves  analyzing  each  of  the  steps  within  FHE.  Add  and  FHE.  Mu  It  -  namely,  the  addition  or  multiplication 
itself,  and  then  the  Powersof2,  Scale  and  SwitchKey  steps  that  make  up  FHE. Refresh  -  to  establish  that  the 
output  of  each  step  is  a  decryptable  ciphertext  with  bounded  noise.  This  analysis  will  lead  to  concrete 
suggestions  for  how  to  set  the  ladder  of  moduli  and  to  asymptotic  bounds  on  the  performance  of  the  scheme. 

Let  us  begin  by  considering  how  much  noise  FHE.Enc  introduces  initially. 
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4.1  The  Initial  Noise  from  FHE.Enc 


Recall  that  FHE.Enc  simply  invokes  E.Enc  for  suitable  parameters  (par  urns  rj  that  depend  on  A  and  L.  In 
turn,  the  noise  of  ciphertexts  output  by  E.Enc  depends  on  the  noise  of  the  initial  “ciphertexts”  (the  encryp¬ 
tions  of  0)  implicit  in  the  matrix  A  output  by  E.PublicKeyGen,  whose  noise  distribution  is  dictated  by  the 
distribution 

Lemma  5.  Let  n p  and  qp  be  the  parameters  associated  to  FHE.Enc.  Let  d  be  the  dimension  of  the  ring 
R,  and  let  pr  be  the  expansion  factor  associated  to  R.  (Both  of  these  quantities  are  1  when  R  =  Z.) 
Let  If  be  a  bound  such  that  R-elements  sampled  from  the  the  noise  distribution  \  have  length  at  most 
By  with  overwhelming  probability.  The  length  of  the  noise  in  ciphertexts  output  by  FHE.Enc  is  at  most 
1  +  2-7 R-Vd-  ((2 nL  +  1)  log  qL)  ■  Bx. 

Proof  Recall  that  s  -t—  E.SecretKeyGen  and  A  -t—  E.PublicKeyGen(s,  N)  for  N  =  (2 ul  +  l)log  qL, 
where  A  •  s  =  2e  for  e  -t—  7.  Recall  that  encryption  works  as  follows:  c  +-  m  +  A7  r  mod  q  where 
r  £  Ilf .  We  have  that  the  noise  of  this  ciphertext  is  [(c,  s)]q  =  [m  +  2(r,  e)]9,  whose  magnitude  is  at  most 
1  +  2-7 fl  ■  E %!  I|r[j] ||  •  ||e[j]  ||  <  1  +  2  •  pR  -  s/d  •  N  •  Bx.  □ 

Notice  that  we  are  using  very  loose  (i.e.,  conservative)  upper  bounds  for  the  noise.  These  bounds 
could  be  tightened  up  with  a  more  careful  analysis.  The  correctness  of  decryption  for  ciphertexts  output 
by  FHE.Enc,  assuming  the  noise  bound  above  is  less  than  q/2,  follows  directly  from  the  correctness  of  the 
basic  encryption  and  decryption  algorithms  E.Enc  and  E.Dec. 

4.2  Correctness  and  Performance  of  FFlE.Add  and  FHE.Mult  (before  FHE. Refresh) 

Consider  FHE.Mult.  One  begins  FHE.Mult(pfc,  ci,  C2)  with  two  ciphertexts  under  key  s j  for  modulus  qj 
that  have  noises  e*  =  [LCi(sj)]g.,  where  LCi(x)  is  simply  the  dot  product  (cj,  x).  To  multiply  together  two 
ciphertexts,  one  multiplies  together  these  two  linear  equations  to  obtain  a  quadratic  equation  QCl  C2  (x)  -t— 
LCl  (x)  •  LC2  (x),  and  then  interprets  this  quadratic  equation  as  a  linear  equation  Llffc2  (x  0  x)  =  Qci,c2  (x) 
over  the  tensored  vector  x  0  x.  The  coefficients  of  this  long  linear  equation  compose  the  new  ciphertext 
vector  c3.  Clearly,  [(c3,  Sj  0  s f)]qj  =  [Ll™22(sj  0  =  [e  1  '  62] ■  Thus,  if  the  noises  of  c3  and  C2  have 

length  at  most  B,  then  the  noise  of  c3  has  length  at  most  pr;  ■  If2,  where  pn  is  the  expansion  factor  of  II.  If 
this  length  is  less  than  qj /2,  then  decryption  works  coiTectly.  In  particular,  if  m,  =  [{<7.  s ^ ) ] ^ ] 2  =  [e,] 2  for 
i  £  {1,  2},  then  over  R2  we  have  [<c3,  sj  0  Sj)]gj.]2  =  [[ei  •  e2\qj\2  =  [ei  •  e2]2  =  [ei]2  •  [e2]2  =  mi  •  rn2. 
That  is,  correctness  is  preserved  as  long  as  this  noise  does  not  wrap  modulo  qr 

The  correctness  of  FHE. Add  and  FHE.Mult  (before  FHE. Refresh)  is  formally  captured  in  the  following 
lemmas. 

Lemma  6.  Let  c  1  and  c2  be  two  ciphertexts  under  key  s j  for  modulus  qj,  where  ||  [{<7.  s f)]qj  <  B  and 
rrii  =  [[(Ci.sj)]^.  Let  s'-  =  s j  0  s j,  where  the  “non- quadratic  coefficients”  of  si  ( namely ,  the  ‘1’  and 
the  coefficients  of  s-f)  are  placed  first.  Let  c'  =  Ci  +  C2,  and  pad  c'  with  zeros  to  get  a  vector  c3  such  that 
(c3,s '■)  =  (c',s j).  The  noise  [(c3,s'  )]gj  has  length  at  most  2 B.  If  2B  <  qj/2,  c3  is  an  encryption  of 
m\  +  m2  under  key  s'  for  modulus  qj  -  i.e.,  mi  ■  m2  =  [[(c3,  s'  )]q2]2. 

Lemma  7.  Let  ci  and  c2  be  two  ciphertexts  under  key  s j  for  modulus  qj,  where  ||  [(ct.  s j))qj  ||  <  B  and 
m i  =  [[(cj,Sj)]g.]2.  Let  the  linear  equation  Llc'l%2 (x  0  x)  be  as  defined  above,  let  c3  be  the  coefficient 
vector  of  this  linear  equation,  and  let  s'-  =  s j  0  s j.  The  noise  [(c3,  s'  )]9j.  has  length  at  most  pp  ■  B2.  If 
Pr-  B2  <  qj/ 2,  c3  is  an  encryption  of  mi  •  m2  under  key  s'  for  modulus  qj  -  i.e.,  mi  ■  m2  =  [[(c3,  s'-)]gj]2- 
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The  computation  needed  to  compute  the  tensored  ciphertext  C3  is  0 ( dnj  log  q:i ) .  For  the  RLWE  instan¬ 
tiation,  since  rij  =  1  and  since  (as  we  will  see)  log  qj  depends  logarithmically  on  the  security  parameter  and 
linearly  on  L,  the  computation  here  is  only  quasi-linear  in  the  security  parameter.  For  the  LWE  instantiation, 
the  computation  is  quasi-quadratic. 

4.3  Correctness  and  Performance  of  F  H  E .  Ref  res  h 

FHE. Refresh  consists  of  three  steps:  Expand,  Switch  Moduli,  and  Switch  Keys.  We  address  each  of  these 
steps  in  turn. 

Correctness  and  Performance  of  the  Expand  Step.  The  Expand  step  of  FHE. Refresh  takes  as  input  a  long 
ciphertext  c  under  the  long  tensored  key  s'  =  s j  (g>  sj  for  modulus  qj.  It  simply  applies  the  Powersof2 
transformation  to  c  to  obtain  ci.  By  Lemma  2,  we  know  that 

(Powersof2(c ,qj),  BitDecomp(s),  qj))  =  (c,  s'  )  mod  qj 

i.e.,  we  know  that  if  s'  decrypts  c  correctly,  then  s"  decrypts  ci  correctly.  The  noise  has  not  been  affected 
at  all. 

If  implemented  naively,  the  computation  in  the  Expand  step  is  0(dnj  log2  qj).  The  somewhat  high 
computation  is  due  to  the  fact  that  the  expanded  ciphertext  is  a  ( (n,2  1 )  •  [ log  q:l]  (-dimensional  vector  over 
Rq. 

However,  recall  that  s j  is  drawn  using  the  distribution  x  ~  he.,  it  has  small  coefficients  of  size  basically 
independent  of  qj.  Consequently,  s'  also  has  small  coefficients,  and  we  can  use  this  a  priori  knowledge 
in  combination  with  an  optimized  version  of  BitDecomp  to  output  a  shorter  bit  decomposition  of  s)  -  in 
particular,  a  ( (";'2  ')  •  [log  q']  )-dimensional  vector  over  Rq  where  q'  <C  qj  is  a  bound  (with  overwhelming 
probability)  on  the  coefficients  of  elements  output  by  x-  Similarly,  we  can  use  an  abbreviated  version  of 
Powersof2(c,  qj).  In  this  case,  the  computation  is  0(drvj  log  qj). 

Correctness  and  Performance  of  the  Switch-Moduli  Step.  The  Switch  Moduli  step  takes  as  input  a  cipher- 
text  ci  under  the  secret  bit-vector  s"  for  the  modulus  qj,  and  outputs  the  ciphertext  C2  Scale(ci ,  qj ,  qj-i,  2), 
which  we  claim  to  be  a  ciphertext  under  key  s"  for  modulus  qj-\.  Note  that  s"  is  a  short  secret  key,  since  it 
is  a  bit  vector  in  itj  for  tj  <  ' )  •  [ log  qj) .  By  Corollary  1 ,  and  using  the  fact  that  £  \  ( s" )  <  \fd  ■  tj,  the 

following  is  true:  if  the  noise  of  ci  has  length  at  most  B  <  qj /2  —  ( q} /qj-i)  ■  d  ■  jr  ■  tj,  then  correctness 
is  preserved  and  the  noise  of  C2  is  bounded  by  (q:j-\  /qj)  ■  B  +  d  ■  7#  •  tj.  Of  course,  the  key  feature  of  this 
step  for  our  purposes  is  that  switching  moduli  may  reduce  the  length  of  the  moduli  when  q;  \  <  qj. 

We  capture  the  correctness  of  the  Switch-Moduli  step  in  the  following  lemma. 

Lemma  8.  Let  ci  be  a  ciphertext  under  the  key  s"  =  BitDecomp(sj  <g)  s j,  qj)  such  that  e,j  •(—  [(ci,  s")]9j. 
has  length  at  most  B  and  m  =  [ej] 2.  Let  C2  <—  Scale(ci,  qj,  qj-i,  2),  and  let  6j-\  =  [(C2,  s'')]q2_r  Then , 

ej-i  (the  new  noise)  has  length  at  most  (qj-i/qj)  •  B  +  d  -  7_r  •  '  [l°g  Qj]>  anc 1  (assuming  this  noise 

length  is  less  than  qj_i/2)we  have  m  =  [ej- 1]2- 

The  computation  in  the  Switch-Moduli  step  is  0(dn2  log  qj),  using  the  optimized  versions  of  BitDecomp 
and  Powersof2  mentioned  above. 

Correctness  and  Performance  of  the  Switch-Key  Step.  Finally,  in  the  Switch  Keys  step,  we  take  as  input  a 
ciphertext  C2  under  key  s"  for  modulus  qt 1  and  set  C3  4-  Switch  Key  (rs//_,s  , .  C2,  Hj-i),  a  ciphertext  un- 
der  the  key  Sj_i  for  modulus  qj-\.  In  Lemma  3,  we  proved  the  correctness  of  key  switching  and  established 
that  the  noise  grows  only  by  the  additive  factor  2  (BitDecomp(c2,  qj- 1),  e),  where  BitDecomp(c2,  qj- 1)  is 
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a  (short)  bit-vector  and  e  is  a  (short  and  fresh)  noise  vector.  In  particular,  if  the  noise  originally  had  length 
B,  then  after  the  Switch  Keys  step  is  has  length  at  most  B  +  2  •  7#  •  Yi= 1  II  BitDecomp(c2,  qj-i)  [?■]  ||  •  Bx  < 
B  +  2  •  7#  •  Uj  ■  \/d  ■  Bx,  where  Uj  <  '  [l°g (-lj  \  •  [log q3-\  \  is  the  dimension  of  BitDecomp(c2). 

We  capture  the  correctness  of  the  Switch-Key  step  in  the  following  lemma. 

Lemma  9.  Let  C2  be  a  ciphertext  under  the  key  s"  =  BitDecompfSj  <8>  Sj ,  q3 )  for  modulus  q:j _  1  such  that 
e\  4—  [(c2,  s j)\qj_1  has  length  at  most  B  and  m  =  [e\}2-  Let  C3  4—  SwitchKey(rs«_>Sj. ,  C2,  qj-i),  and  let 

e 2  =  [(c3,  Sj_i)]g._1.  77?c«,  e2  {the  new  noise)  has  length  at  most  B  +  2  •  7#  •  ■  [logc/j]2  •  Vd  ■  Bx 

and  ( assuming  this  noise  length  is  less  than  qj_i/2)  we  have  m  =  [e2]2- 

In  terms  of  computation,  the  Switch-Key  step  involves  multiplying  the  transpose  of  Uj -dimensional 
vector  BitDecomp(c2)  with  a  Uj  x  (ny_  1  +  1)  matrix  B.  Assuming  rij  >  rij- 1  and  qj  >  qj-i,  and  using 
the  optimized  versions  of  BitDecomp  and  Powersof2  mentioned  above  to  reduce  Uj,  this  computation  is 
0{dn j  log2  qj).  Still  this  is  quasi-linear  in  the  RLWE  instantiation. 

4.4  Putting  the  Pieces  Together:  Parameters,  Correctness,  Performance 

So  far  we  have  established  that  the  scheme  is  correct,  assuming  that  the  noise  does  not  wrap  modulo  qt  or 
qj- 1.  Now  we  need  to  show  that  we  can  set  the  parameters  of  the  scheme  to  ensure  that  such  wrapping  never 
occurs. 

Our  strategy  for  setting  the  parameters  is  to  pick  a  “universal”  bound  B  on  the  noise  length,  and  then 
prove,  for  all  j,  that  a  valid  ciphertext  under  key  s j  for  modulus  q3  has  noise  length  at  most  B.  This  bound  B 
is  quite  small:  polynomial  in  A  and  log  <57,  where  qr  is  the  largest  modulus  in  our  ladder.  It  is  clear  that  such 
abound  B  holds  for  fresh  ciphertexts  output  by  FHE.Enc.  (Recall  the  discussion  from  Section  3.1  where  we 
explained  that  we  use  a  noise  distribution  x  that  is  essentially  independent  of  the  modulus.)  The  remainder 
of  the  proof  is  by  induction  -  i.e.,  we  will  show  that  if  the  bound  holds  for  two  ciphertexts  ci,  C2  at  level 
j,  our  lemmas  above  imply  that  the  bound  also  holds  for  the  ciphertext  c'  4—  FHE.Mult(pA:,  ci,  C2)  at  level 
j  —  1.  (FHE.Mult  increases  the  noise  strictly  more  in  the  worst-case  than  FHE.Add  for  any  reasonable 
choice  of  parameters.) 

Specifically,  after  the  first  step  of  FHE.Mult  (without  the  Refresh  step),  the  noise  has  length  at  most 
7/7  •  B2.  Then,  we  apply  the  Scale  function,  after  which  the  noise  length  is  at  most  (qj-i/qj)  ■  77/  ■  B2  + 
dScaie,7  where  r/scaie,j  is  some  additive  term.  Finally,  we  apply  the  Switch  Key  function,  which  introduces 
another  additive  term  //SwitchKeyj-  Overall,  after  the  entire  FHE.Mult  step,  the  noise  length  is  at  most 
(qj-i/qj)  ■  7/7  ■  B2  +  pscalej  +  '/Switch Key We  want  to  choose  our  parameters  so  that  this  bound  is  at  most 
B.  Suppose  we  set  our  ladder  of  moduli  and  the  bound  B  such  that  the  following  two  properties  hold: 


•  Property  1:  B  >  2  •  (??Scaie,i  +  r/SwitchKeyj)  for  all  j. 

•  Property  2:  qj/qj- 1  >  2  ■  B  ■  7//  for  all  j. 

Then  we  have 


{qj— l/ Qj)  '  1R  '  B  +  '/Seal e,j  “F  '/Switch Key fS  '  7 R  B  '  B  B  B 

2  ■  B  ■  7//  2 

It  only  remains  to  set  our  ladder  of  moduli  and  B  so  that  Properties  1  and  2  hold. 

Unfortunately,  there  is  some  circularity  in  Properties  1  and  2:  qjj  depends  on  B,  which  depends  on  qr, 
albeit  only  polylogarithmically.  However,  it  is  easy  to  see  that  this  circularity  is  not  fatal.  As  a  non-optimized 
example  to  illustrate  this,  set  B  =  Xa  ■  Lb  for  very  large  constants  a  and  6,  and  set  qt  ~  2^+1f‘J(log  A+logi^. 
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If  a  and  b  arc  large  enough,  B  dominates  J/scaie,L  +  '/Switch Key./-,'  which  is  polynomial  in  A  and  log qi,,  and 
hence  polynomial  in  A  and  L  (Property  1  is  satisfied).  Since  qj/qj- 1  is  super-polynomial  in  both  A  and  L,  it 
dominates  2  •  B  •  7#  (Property  2  is  satisfied).  In  fact,  it  works  fine  to  set  qj  as  a  modulus  having  (j  +1)  •  /' 
bits  for  some  //  =  //(log  A  +  log  L)  with  small  hidden  constant. 

Overall,  we  have  that  q^,  the  largest  modulus  used  in  the  system,  is  6(L  •  (log  A  +  log  L))  bits,  and  d  ■  11  jj 
must  be  approximately  that  number  times  A  for  2A  security. 

Theorem  3.  For  some  //  =  //(log  A  +  log  L),  FHE  is  a  correct  L-leveled  FHE  scheme  -  specifically,  it 
correctly  evaluates  circuits  of  depth  L  with  Add  and  Mult  gates  over  IF-  The  per-gate  computation  is 
0{d  ■  n\  ■  log 2  qj)  =  0(d  ■  n\  ■  L 2).  For  the  LWE  case  (where  d  =  1),  the  per-gate  computation  is 
0( A3  •  IF).  For  the  RLWE  case  (where  n  =  1),  the  per-gate  computation  is  0( A  •  IF). 

The  bottom  line  is  that  we  have  a  RLWE-based  leveled  FHE  scheme  with  per-gate  computation  that  is 
only  quasi-linear  in  the  security  parameter,  albeit  with  somewhat  high  dependence  on  the  number  of  levels 
in  the  circuit. 

Let  us  pause  at  this  point  to  reconsider  the  performance  of  previous  FHE  schemes  in  comparison  to  our 
new  scheme.  Specifically,  as  we  discussed  in  the  Introduction,  in  previous  SWHE  schemes,  the  ciphertext 
size  is  at  least  0( A  •  d?),  where  d  is  the  degree  of  the  circuit  being  evaluated.  One  may  view  our  new  scheme 
as  a  very  powerful  SWHE  scheme  in  which  this  dependence  on  degree  has  been  replaced  with  a  similar 
dependence  on  depth.  (Recall  the  degree  of  a  circuit  may  be  exponential  in  its  depth.)  Since  polynomial- 
size  circuits  have  polynomial  depth,  which  is  certainly  not  true  of  degree,  our  scheme  can  efficiently  evaluate 
arbitrary  circuits  without  resorting  to  bootstrapping. 

4.5  Security 

The  security  of  FHE  follows  by  a  standard  hybrid  argument  from  the  security  of  E,  the  basic  scheme  de¬ 
scribed  in  Section  3.1.  We  omit  the  details. 

5  Optimizations 

Despite  the  fact  that  our  new  FHE  scheme  has  per-gate  computation  only  quasi-linear  in  the  security  param¬ 
eter,  we  present  several  significant  ways  of  optimizing  it.  We  focus  primarily  on  the  RLWE-based  scheme, 
since  it  is  much  more  efficient. 

Our  first  optimization  is  batching.  Batching  allows  us  to  reduce  the  per-gate  computation  from  quasi- 
linear  in  the  security  parameter  to  poly  logarithmic.  In  more  detail,  we  show  that  evaluating  a  function  / 
homomorphic  ally  on  i  =  Q(A)  blocks  of  encrypted  data  requires  only  polylogarithmically  (in  terms  of  the 
security  parameter  A)  more  computation  than  evaluating  /  on  the  unencrypted  data.  (The  overhead  is  still 
polynomial  in  the  depth  L  of  the  circuit  computing  /.)  Batching  works  essentially  by  packing  multiple 
plaintexts  into  each  ciphertext. 

Next,  we  reintroduce  bootstrapping  as  an  optimization  rather  than  a  necessity  (Section  5.2).  Bootstrap¬ 
ping  allows  us  to  achieve  per-gate  computation  quasi -quadratic  in  the  security  parameter,  independent  of 
the  number  levels  in  the  circuit  being  evaluated. 

In  Section  5.3,  we  show  that  batching  the  bootstrapping  function  is  a  powerful  combination.  With  this 
optimization,  circuits  whose  levels  mostly  have  width  at  least  A  can  be  evaluated  homomorphically  with 
only  0( A)  per-gate  computation,  independent  of  the  number  of  levels. 

Finally,  Section  5.5  presents  a  few  other  miscellaneous  optimizations. 

5.1  Batching 

Suppose  we  want  to  evaluate  the  same  function  /  on  (  blocks  of  encrypted  data.  (Or,  similarly,  suppose  we 
want  to  evaluate  the  same  encrypted  function  /  on  (  blocks  of  plaintext  data.)  Can  we  do  this  using  less  than 
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t  times  the  computation  needed  to  evaluate  /  on  one  block  of  data?  Can  we  batch? 

For  example,  consider  a  keyword  search  function  that  returns  ‘F  if  the  keyword  is  present  in  the  data 
and  ‘O'  if  it  is  not.  The  keyword  search  function  is  mostly  composed  of  a  large  number  of  equality  tests  that 
compare  the  target  word  w  to  all  of  the  different  subsequences  of  data;  this  is  followed  up  by  an  OR  of  the 
equality  test  results.  All  of  these  equality  tests  involve  running  the  same  m-dependent  function  on  different 
blocks  of  data.  If  we  could  batch  these  equality  tests,  it  could  significantly  reduce  the  computation  needed 
to  perform  keyword  search  homomorphically. 

If  we  use  bootstrapping  as  an  optimization  (see  Section  5.2),  then  obviously  we  will  be  running  the 
decryption  function  homomorphically  on  multiple  blocks  of  data  -  namely,  the  multiple  ciphertexts  that 
need  to  be  refreshed.  Can  we  batch  the  bootstrapping  function?  If  we  could,  then  we  might  be  able  to 
drastically  reduce  the  average  per-gate  cost  of  bootstrapping. 

Smart  and  Vercauteren  [21]  were  the  first  to  rigorously  analyze  batching  in  the  context  of  FHE.  In 
particular,  they  observed  that  ideal-lattice-based  (and  RLWE-based)  ciphertexts  can  have  many  plaintext 
slots,  associated  to  the  factorization  of  the  plaintext  space  into  algebraic  ideals. 

When  we  apply  batching  to  our  new  RLWE-based  FHE  scheme,  the  results  are  pretty  amazing.  Evaluat¬ 
ing  /  homomorphically  on  l  =  <>(A)  blocks  of  encrypted  data  requires  only  polylogarithmically  (in  terms 
of  the  security  parameter  A)  more  computation  than  evaluating  /  on  the  unencrypted  data.  (The  overhead  is 
still  polynomial  in  the  depth  L  of  the  circuit  computing  /.)  As  we  will  see  later,  for  circuits  whose  levels 
mostly  have  width  at  least  A,  batching  the  bootstrapping  function  (i.e.,  batching  homomorphic  evaluation 
of  the  decryption  function)  allows  us  to  reduce  the  per-gate  computation  of  our  bootstrapped  scheme  from 
0( A2)  to  0(A)  (independent  of  L). 

To  make  the  exposition  a  bit  simpler,  in  our  RLWE-based  instantiation  where  R  =  Z[x]/(xd  +  1),  we 
will  not  use  R-2  as  our  plaintext  space,  but  instead  use  a  plaintext  space  Rp  that  is  isomorphic  to  the  direct 
product  RPl  x  •  •  •  x  RVd  of  many  plaintext  spaces  (think  Chinese  remaindering),  so  that  evaluating  a  function 
once  over  Rp  implicitly  evaluates  the  function  many  times  in  parallel  over  the  respective  smaller  plaintext 
spaces.  The  pj’s  will  be  ideals  in  our  ring  R  =  Z[x]/(xd  +  1).  (One  could  still  use  R2  as  in  [21],  but  the 
number  theory  there  is  a  bit  more  involved.) 

5.1.1  Some  Number  Theory 

Let  us  take  a  very  brief  tour  of  algebraic  number  theory.  Suppose  p  is  a  prime  number  satisfying  p  = 
1  mod  2d,  and  let  a  be  a  primitive  2d-th  root  of  unity  modulo  p.  Then,  xd  +  1  factors  completely  into  linear 
polynomials  modulo  p  -  in  particular,  xd  +  1  =  \\-=f  x  —  Oj)  mod  p  where  a,  =  a2*-1  mod  p.  In  some 
sense,  the  converse  of  the  above  statement  is  also  true,  and  this  is  the  essence  of  reciprocity  -  namely,  in  the 
ring  It  =  Z[x\/(xd  +  1)  the  prime  integer  p  is  not  actually  prime,  but  rather  it  splits  completely  into  prime 
ideals  in  R  -  i.e.,  p  =  nf=i  Pi-  The  ideal  pj  equals  (p,  x  —  af)  -  namely,  the  set  of  all  /(-elements  that  can  be 
expressed  as  n  ■  p  +  r-2  ■  (x  —  af)  for  some  n,  €  R.  Each  ideal  p,  has  norm  p  -  that  is,  roughly  speaking, 
a  1/p  fraction  of  /(-elements  are  in  pj,  or,  more  formally,  the  p  cosets  0  +  pj,  1 )  -|  p,  partition  R. 

These  ideals  arc  relative  prime,  and  so  they  behave  like  relative  prime  integers.  In  particular,  the  Chinese 
Remainder  Theorem  applies:  Rp  =  RVl  x  •  •  •  x  Rpd- 

Although  the  prime  ideals  {pj}  arc  relatively  prime,  they  arc  close  siblings,  and  it  is  easy,  in  some 
sense,  to  switch  from  one  to  another.  One  fact  that  we  will  use  (when  we  finally  apply  batching  to  boot¬ 
strapping)  is  that,  for  any  i.  j  there  is  an  automorphism  over  R  that  maps  elements  of  p  j  to  elements 
of  pj.  Specifically,  a works  by  mapping  an  /(-element  r  =  r(x)  =  r,i_ \ xd  1  +  •  •  •  +  r\x  +  ro  to 
r(x£ij )  =  rd- mod  2d  +  •  •  •  +  rixeij  +  ro  where  ejj  is  some  odd  number  in  [1,  2d}.  Notice  that 
this  automorphism  just  permutes  the  coefficients  of  r  and  fixes  the  free  coefficient.  Notationally,  we  will  use 
(7 i _ y j  ( v)  to  refer  to  the  vector  that  results  from  applying  Oj-^j  coefficient-wise  to  v. 
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5.1.2  How  Batching  Works 

Deploying  batching  inside  our  scheme  FHE  is  quite  straightforward.  First,  we  pick  a  prime  p  =  1  mod  2d 
of  size  polynomial  in  the  security  parameter.  (One  should  exist  under  the  GRH.) 

The  next  step  is  simply  to  recognize  that  our  scheme  FHE  works  just  fine  when  we  replace  the  original 
plaintext  space  R2  with  Rp.  There  is  nothing  especially  magical  about  the  number  2.  In  the  basic  scheme  E 
described  in  Section  3.1,  E.PublicKeyGen(params,  sk)  is  modified  in  the  obvious  way  so  that  A  •  s  =  p  ■  e 
rather  than  2  •  e.  (This  modification  induces  a  similar  modification  in  Switch  KeyGen.)  Decryption  becomes 
m  =  [[(c,  s)]9]p.  Homomorphic  operations  use  mod-7;  gates  rather  than  boolean  gates,  and  it  is  easy  (if 
desired)  to  emulate  boolean  gates  with  mod-p  gates  -  e.g.,  we  can  compute  XOR(a,  b)  for  a,  b  e  {0,  l}2 
using  mod-71  gates  for  any  p  as  a  +  b  —  2 ab.  For  modulus  switching,  we  use  Scale(ci,  q3.  qj_i,p)  rather 
than  Scale(ci,  q3.  qj-i,  2).  The  larger  rounding  error  from  this  new  scaling  procedure  increases  the  noise 
slightly,  but  this  additive  noise  is  still  polynomial  in  the  security  parameter  and  the  number  of  levels,  and 
thus  is  still  consistent  with  our  setting  of  parameters.  In  short,  FHE  can  easily  be  adapted  to  work  with  a 
plaintext  space  Rp  for  p  of  polynomial  size. 

The  final  step  is  simply  to  recognize  that,  by  the  Chinese  Remainder  Theorem,  evaluating  an  arithmetic 
circuit  over  Rp  on  input  x  e  Rp)  implicitly  evaluates,  for  each  i,  the  same  arithmetic  circuit  over  RVi  on 
input  x  projected  down  to  .  The  evaluations  modulo  the  various  prime  ideals  do  not  “mix”  or  interact 
with  each  other. 

Theorem  4.  Let  p  =  1  mod  2d  be  a  prime  of  size  polynomial  in  A.  The  RLWE -based  instantiation  o/FHE 
using  the  ring  R  =  Z[x]/ (xd+ 1)  can  be  adapted  to  use  the  plaintext  space  Rp  =  <g>f=1Rpi  while  preserving 
correctness  and  the  same  asymptotic  performance.  For  any  boolean  circuit  f  of  depth  L,  the  scheme  can 
homomorphically  evaluate  f  on  i  sets  of  inputs  with  per-gate  computation  0(  A  •  L3/  min{c?,  1}). 

When  t  >  A,  the  per-gate  computation  is  only  polylogarithmic  in  the  security  parameter  (still  cubic  in  L). 

5.2  Bootstrapping  as  an  Optimization 

Bootstrapping  is  no  longer  strictly  necessary  to  achieve  leveled  FHE.  However,  in  some  settings,  it  may  have 
some  advantages: 

•  Performance:  The  per-gate  computation  is  independent  of  the  depth  of  the  circuit  being  evaluated. 

•  Flexibility:  Assuming  circular  security,  a  bootstrapped  scheme  can  perform  homomorphic  evaluations 
indefinitely  without  needing  to  specify  in  advance,  during  Setup,  a  bound  on  the  number  of  circuit 
levels. 

•  Memory:  Bootstrapping  permits  short  ciphertexts  -  e.g.,  encrypted  using  AES  -  to  be  de-compressed 
to  longer  ciphertexts  that  permit  homomorphic  operations.  Bootstrapping  allows  us  to  save  memory 
by  storing  data  encrypted  in  the  compressed  form  -  e.g.,  under  AES. 

Here,  we  revisit  bootstrapping,  viewing  it  as  an  optimization  rather  than  a  necessity.  We  also  reconsider 
the  scheme  FHE  that  we  described  in  Section  3,  viewing  the  scheme  not  as  an  end  in  itself,  but  rather  as  a  very 
powerful  SWHE  whose  performance  degrades  polynomially  in  the  depth  of  the  circuit  being  evaluated,  as 
opposed  to  previous  SWHE  schemes  whose  performance  degrades  polynomially  in  the  degree.  In  particular, 
we  analyze  how  efficiently  it  can  evaluate  its  decryption  function,  as  needed  to  bootstrap.  Not  surprisingly, 
our  faster  SWHE  scheme  can  also  bootstrap  faster.  The  decryption  function  has  only  logarithmic  depth 
and  can  be  evaluated  homomorphically  in  time  quasi-quadratic  in  the  security  parameter  (for  the  RLWE 
instantiation),  giving  a  bootstrapped  scheme  with  quasi-quadratic  per-gate  computation  overall. 
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5.2.1  Decryption  as  a  Circuit  of  Quasi-Linear  Size  and  Logarithmic  Depth 

Recall  that  the  decryption  function  is  m  =  [[(c,  s ) ] ry ] 2 •  Suppose  that  we  arc  given  the  “bits”  (elements  in  Rf) 
of  s  as  input,  and  we  want  to  compute  [[(c,  s ) ] ^ ] 2  using  an  arithmetic  circuit  that  has  Add  and  Mult  gates 
over  R-2-  (When  we  bootstrap,  of  course  we  are  given  the  bits  of  s  in  encrypted  form.)  Note  that  we  will 
run  the  decryption  function  homomorphic  ally  on  level-0  ciphertexts  -  i.e.,  when  q  is  small,  only  polynomial 
in  the  security  parameter.  What  is  the  complexity  of  this  circuit?  Most  importantly  for  our  purposes,  what 
is  its  depth  and  size?  The  answer  is  that  we  can  perform  decryption  with  0(A)  computation  and  0(log  A) 
depth.  Thus,  in  the  RLWE  instantiation,  we  can  evaluate  the  decryption  function  homomorphically  using  our 
new  scheme  with  quasi-quadratic  computation.  (For  the  LWE  instantiation,  the  bootstrapping  computation 
is  quasi-quartic.) 

First,  let  us  consider  the  LWE  case,  where  c  and  s  are  n-dimensional  integer  vectors.  Obviously,  each 
product  c[i]  •  s  [1  can  be  written  as  the  sum  of  at  most  log  q  “shifts”  of  s[z] .  These  horizontal  shifts  of 
s[i]  use  at  most  2  log  q  columns.  Thus,  (c,  s)  can  be  written  as  the  sum  of  n  ■  log  q  numbers,  where  each 
number  has  2  log  q  digits.  As  discussed  in  [8],  we  can  use  the  three-for-two  trick,  which  takes  as  input 
three  numbers  in  binary  (of  arbitrary  length)  and  outputs  (using  constant  depth)  two  binary  numbers  with 
the  same  sum.  Thus,  with  0(log(n  •  log  q) )  =  0(\ogn  +  log  log  q)  depth  and  0(nlog2  q)  computation, 
we  obtain  two  numbers  with  the  desired  sum,  each  having  0(logn  +  log  q)  bits.  We  can  sum  the  final 
two  numbers  with  0(log  log  n  +  log  log  q)  depth  and  0(logn  +  log  q)  computation.  So  far,  we  have  used 
depth  0(logn  +  log  log  q)  and  0(n  log2  q)  computation  to  compute  (c,  s).  Reducing  this  value  modulo  q 
is  an  operation  akin  to  division,  for  which  there  are  circuits  of  size  polylog(q)  and  depth  log  log  q.  Finally, 
reducing  modulo  2  just  involves  dropping  the  most  significant  bits.  Overall,  since  we  arc  interested  only  in 
the  case  where  log  q  =  0(log  A),  we  have  that  decryption  requires  0(A)  computation  and  depth  0(log  A). 

For  the  RLWE  case,  we  can  use  the  R2  plaintext  space  to  emulate  the  simpler  plaintext  space  Z2.  Using 
Z2,  the  analysis  is  basically  the  same  as  above,  except  that  we  mention  that  the  DFT  is  used  to  multiply 
elements  in  R. 

In  practice,  it  would  be  useful  to  tighten  up  this  analysis  by  reducing  the  polylogarithmic  factors  in 
the  computation  and  the  constants  in  the  depth.  Most  likely  this  could  be  done  by  evaluating  decryption 
using  symmetric  polynomials  [8,  9]  or  with  a  valiant  of  the  “grade-school  addition”  approach  used  in  the 
Gentry-Hale vi  implementation  [10]. 

5.2.2  Bootstrapping  Lazily 

Bootstrapping  is  rather  expensive  computationally.  In  particular,  the  cost  of  bootstrapping  a  ciphertext  is 
greater  than  the  cost  of  a  homomorphic  operation  by  approximately  a  factor  of  A.  This  suggests  the  question: 
can  we  lower  per-gate  computation  of  a  bootstrapped  scheme  by  bootstrapping  lazily  -  i.e.,  applying  the 
refresh  procedure  only  at  a  1  /L  fraction  of  the  circuit  levels  for  some  well-chosen  L  [11]?  Here  we  show 
that  the  answer  is  yes.  By  bootstrapping  lazily  for  L  =  0(log  A),  we  can  lower  the  per-gate  computation  by 
a  logarithmic  factor. 

Let  us  present  this  result  somewhat  abstractly.  Suppose  that  the  per-gate  computation  for  a  /.-level  no¬ 
bootstrapping  FHE  scheme  is  /(A,  L)  =  Aai  •  L“2.  (We  ignore  logarithmic  factors  in  /,  since  they  will 
not  affect  the  analysis,  but  one  can  imagine  that  they  add  a  very  small  e  to  the  exponent.)  Suppose  that 
bootstrapping  a  ciphertext  requires  a  c-depth  circuit.  Since  we  want  to  be  capable  of  evaluation  depth  L 
after  evaluating  the  c  levels  need  to  bootstrap  a  ciphertext,  the  bootstrapping  procedure  needs  to  begin  with 
ciphertexts  that  can  be  used  in  a  (c  +  L)-depth  circuit.  Consequently,  let  us  say  that  the  computation  needed 
a  bootstrap  a  ciphertext  is  g{ A,  c  +  L)  where  g( A,  x)  =  Xh]  ■  xb2.  The  overall  per-gate  computation  is 
approximately  /(A,  L)  +  g{ A,  c  +  L) / L,  a  quantity  that  we  seek  to  minimize. 
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We  have  the  following  lemma. 

Lemma  10.  Let  f(X,L )  =  Aai  •  L“2  and  g(X,L )  =  Xbl  ■  Lb'2  for  constants  h\  >  a±  and  02  >  (12  >  1. 
Let  L(A,  L)  =  /(A,  L)  +  g(A,  c  +  L)/L  for  c  =  0(log  A).  77?e«,  for  fixed  X,  h(  A,  L)  has  a  minimum  for 
L  6  [(c  —  l)/(&2  —  1) ,  0/(62  ~  1)]  ~  i.e.,  ^  some  L  =  0(log  A). 

Proof  Clearly  h( A,  L)  =  +00  at  L  =  0,  then  it  decreases  toward  a  minimum,  and  finally  it  eventually 
increases  again  as  L  goes  toward  infinity.  Thus,  h( A,  L)  has  a  minimum  at  some  positive  value  of  L.  Since 
/(A,  L)  is  monotonically  increasing  (i.e.,  the  derivative  is  positive),  the  minimum  must  occur  where  the 
derivative  of  g( A,  c  +  L)/L  is  negative.  We  have 

— ^(A,  c  +  L)/ L  =  g'{ A,  c  +  L)/L  —  g{ A,  c  +  L)/L2 

=  62  •  Afel  •  (c  +  L)b2_1/L  -  Abl  •  (c  +  L)b2 /L2 
=  (Afel  •  (c  +  L)&2_1/L2)  •  (62  •  L  -  c  -  L)  , 

which  becomes  positive  when  L  >  c/ (62  —  1)  -  i.e.,  the  derivative  is  negative  only  when  L  =  0(log  A).  For 
L  <  (c—  1) / (62  —  1),  we  have  that  the  above  derivative  is  less  than  —  Xbl  ■  (c+L)&2_1/L2,  which  dominates 
the  positive  derivative  of  /.  Therefore,  for  large  enough  value  of  A,  the  value  h( A,  L)  has  its  minimum  at 
some  L  e[(c-  1  )/(b2  -  l),c/(&2  -  !)]•  □ 

This  lemma  basically  says  that,  since  homomorphic  decryption  takes  0(log  A)  levels  and  its  cost  is  super- 
linear  and  dominates  that  of  normal  homomorphic  operations  (FHE.Add  and  FHE.Mult),  it  makes  sense  to 
bootstrap  lazily  -  in  particular,  once  every  0(1  og  A)  levels.  (If  one  bootstrapped  even  more  lazily  than  this, 
the  super-linear  cost  of  bootstrapping  begins  to  ensure  that  the  (amortized)  per-gate  cost  of  bootstrapping 
alone  is  increasing.)  It  is  easy  to  see  that,  since  the  per-gate  computation  is  dominated  by  bootstrapping, 
bootstrapping  lazily  every  0(log  A)  levels  reduces  the  per-gate  computation  by  a  factor  of  0(log  A). 

5.3  Batching  the  Bootstrapping  Operation 

Suppose  that  we  are  evaluating  a  circuit  homomorphically,  that  we  are  currently  at  a  level  in  the  circuit  that 
has  at  least  d  gates  (where  d  is  the  dimension  of  our  ring),  and  that  we  want  to  bootstrap  (refresh)  all  of 
the  ciphertexts  corresponding  to  the  respective  wires  at  that  level.  That  is,  we  want  to  homomorphically 
evaluate  the  decryption  function  at  least  d  times  in  parallel.  This  seems  like  an  ideal  place  to  apply  batching. 

However,  there  are  some  nontrivial  problems.  In  Section  5.1,  our  focus  was  rather  limited.  For  example, 
we  did  not  consider  whether  homomorphic  operations  could  continue  after  the  batched  computation.  Indeed, 
at  first  glance,  it  would  appeal-  that  homomorphic  operations  cannot  continue,  since,  after  batching,  the 
encrypted  data  is  partitioned  into  non-interacting  relatively-prime  plaintext  slots,  whereas  the  whole  point  of 
homomorphic  encryption  is  that  the  encrypted  data  can  interact  (within  a  common  plaintext  slot).  Similarly, 
we  did  not  consider  homomorphic  operations  before  the  batched  computation.  Somehow,  we  need  the  input 
to  the  batched  computation  to  come  pre -partitioned  into  the  different  plaintext  slots. 

What  we  need  are  Pack  and  Unpack  functions  that  allow  the  batching  procedure  to  interface  with  “nor¬ 
mal”  homomorphic  operations.  One  may  think  of  the  Pack  and  Unpack  functions  as  an  on-ramp  to  and  an 
exit -ramp  from  the  “fast  lane”  of  batching.  Let  us  say  that  normal  homomorphic  operations  will  always  use 
the  plaintext  slot  RVl .  Roughly,  the  Pack  function  should  take  a  bunch  of  ciphertexts  c  1 .....  c,/  that  encrypt 
messages  mi , . . . ,  rncj  6  Zp  under  key  si  for  modulus  q  and  plaintext  slot  i?Pl ,  and  then  aggregate  them  into 
a  single  ciphertext  c  under  some  possibly  different  key  S2  for  modulus  q  and  plaintext  slot  Ilp  =  ®f=1RVi, 
so  that  correctness  holds  with  respect  to  all  of  the  different  plaintext  slots  -  i.e.  m,.  =  [[(c,  S2)]9]pi  for 
all  i.  The  Pack  function  thus  allows  normal  homomorphic  operations  to  feed  into  the  batch  operation. 
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The  Unpack  function  should  accept  the  output  of  a  batched  computation,  namely  a  ciphertext  c'  such  that 
rrii  =  [  [  ( c' ,  s',  )]r/]p,  for  all  i,  and  then  de-aggregate  this  ciphertext  by  outputting  ciphertexts  c[, . . .  ,c'd  under 
some  possibly  different  common  secret  key  s2  such  that  m,  =  [[(c-,  s2)],j]Pl  for  all  /.  Now  that  all  of  the 
ciphertexts  are  under  a  common  key  and  plaintext  slot,  normal  homomorphic  operations  can  resume.  With 
such  Pack  and  Unpack  functions,  we  could  indeed  batch  the  bootstrapping  operation.  For  circuits  of  large 
width  (say,  at  least  d)  we  could  reduce  the  per-gate  bootstrapping  computation  by  a  factor  of  d,  making  it 
only  quasi-linear  in  A.  Assuming  the  Pack  and  Unpack  functions  have  complexity  at  most  quasi-quadratic 
in  d  (per-gate  this  is  only  quasi-linear,  since  Pack  and  Unpack  operate  on  d  gates),  the  overall  per-gate 
computation  of  a  batched-bootstrapped  scheme  becomes  only  quasi-linear. 

Here,  we  describe  suitable  Pack  and  Unpack  functions.  These  functions  will  make  heavy  use  of  the 
automorphisms  cr^j  over  R  that  map  elements  of  p,  to  elements  of  p; .  (See  Section  5.1.1.)  We  note  that 
Smart  and  Vercauteren  [21]  used  these  automorphisms  to  construct  something  similar  to  our  Pack  function 
(though  for  unpacking  they  resorted  to  bootstrapping).  We  also  note  that  Lyubashevsky,  Peikert  and  Regev 
[14]  used  these  automorphisms  to  permute  the  ideal  factors  q,  of  the  modulus  q,  which  was  an  essential  tool 
toward  their  proof  of  the  pseudorandomness  of  RLWE. 

Toward  Pack  and  Unpack  procedures,  our  main  idea  is  the  following.  If  m  is  encoded  in  the  free  term 

as  a  number  in  {0, _ ,p  —  1}  and  if  m  =  [[(c,  s)]g]Pi,  then  m  =  [[(ai^j(c),  cTi-^s))],^.  That  is,  we  can 

switch  the  plaintext  slot  but  leave  the  decrypted  message  unchanged  by  applying  the  same  automorphism 
to  the  ciphertext  and  the  secret  key.  (These  facts  follow  from  the  fact  that  <Ji^j  is  a  homomorphism,  that 
it  maps  elements  of  p,  to  elements  of  p;,  and  that  it  fixes  free  terms.)  Of  course,  then  we  have  a  problem: 
the  ciphertext  is  now  under  a  different  key,  whereas  we  may  want  the  ciphertext  to  be  under  the  same  key 
as  other  ciphertexts.  To  get  the  ciphertexts  to  be  back  under  the  same  key,  we  simply  use  the  Switch  Key 
algorithm  to  switch  all  of  the  ciphertexts  to  a  new  common  key. 

Some  technical  remarks  before  we  describe  Pack  and  Unpack  more  formally:  We  mention  again  that 
E.PublicKeyGen  is  modified  in  the  obvious  way  so  that  A  s  =  p-e  rattier  than  2e,  and  that  this  modification 
induces  a  similar  modification  in  Switch KeyGen.  Also,  let  u  £  R  be  a  short  element  such  that  u  £  1  +  pi 
and  u  £  pj  for  all  j  /  1.  It  is  obvious  that  such  a  u  with  coefficients  in  (—p/2, p/2]  can  be  computed 
efficiently  by  first  picking  any  element  u'  such  that  u'  G  1  +  p  i  and  u'  G  pj  for  all  j  ^  1,  and  then  reducing 
the  coefficients  of  vl  modulo  p. 

PackSetup(si,  S2):  Takes  as  input  two  secret  keys  si,S2-  For  all  i  £  [l,d],  it  runs  t— 

SwitchKeyGen((Ji^.j(si),  S2). 

Pack({cj}f=1,  {'^.(s^sjf^):  Takes  as  input  ciphertexts  ci, . . . ,  cd  such  that  m,  =  [[(c*,  Si)]9]Pl  and 
0  =  [[<Cj,  Si)]g]Pj  for  all  j  /  1,  and  also  some  auxiliary  information  output  by  PackSetup.  For  all  i,  it  does 
the  following: 

•  Computes  c*  G-  ai^c/).  (Observe:  to*  =  [[(c*,  (7i_>i(si))yPi  while  0  =  [[(c*, (7n.j(si))]?]Pi  for 
all  j  /  i.) 

•  Runs  cj  4—  SwitchKey(r(7l^!(Sl)^S2,  c*)  (Observe:  Assuming  the  noise  does  not  wrap,  we  have  that 
mi  =  [[(c!>  S2>UP!  and  0  =  [[<c|,  s2)]q]Pj  for  all  j  +  i.) 

Finally,  it  outputs  c  <—  Yli= 1  c\-  (Observe:  Assuming  the  noise  does  not  wrap,  we  have  that  rn,  = 
[[(c,  s2)],]Pi  for  all  i .) 

UnpackSetup(si,  S2):  Takes  as  input  two  secret  keys  si,S2.  For  all  i  £  [l,d],  it  runs  S1)_>.S2 
SwitchKeyGen((jj_?.i(si),  S2). 
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Unpack(c,  {t(T._>1(Si)_>.S2}^=1):  Takes  as  input  a  ciphertext  c  such  that  m,;  =  [[(c,  si)]9]Pi  for  all  i,  and  also 
some  auxiliary  information  output  by  UnpackSetup.  For  all  i,  it  does  the  following: 

•  Computes  c*  +-  it-cq-^c).  (Observe:  Assuming  the  noise  does  not  wrap,  m*  =  [[(c*,  crj_>.i(si))]q]p1 
andO  =  [[(ci,0-j_>i(si))]q]p.  for  all  j  /  1.) 

•  Outputs  c*  <—  Switch  Key  (r^  ^  (S|  j  _>.S2 ,  ct).  (Observe:  Assuming  the  noise  docs  not  wrap,  m,  = 
[[«>s2}]g]Pi  andO  =  [[(c*,s2j]g]Pj.  for  all  j  +  1.) 

Splicing  the  Pack  and  Unpack  procedures  into  our  scheme  FHE  is  tedious  but  pretty  straightforward. 
Although  these  procedures  introduce  many  more  encrypted  secret  keys,  this  does  not  cause  a  circular  security 
problem  as  long  as  the  chain  of  encrypted  secret  keys  is  acyclic;  then  the  standard  hybrid  argument  applies. 
After  applying  Pack  or  Unpack,  one  may  apply  modulus  reduction  to  reduce  the  noise  back  down  to  normal. 

5.4  More  Fun  with  Funky  Plaintext  Spaces 

In  some  cases,  it  might  be  nice  to  have  a  plaintext  space  isomorphic  to  Zp  for  some  large  prime  p  -  e.g., 
one  exponential  in  the  security  parameter.  So  far,  we  have  been  using  Rp  as  our  plaintext  space,  and  (due 
to  the  rounding  step  in  modulus  switching)  the  size  of  the  noise  after  modulus  switching  is  proportional  to 
p.  When  p  is  exponential,  our  previous  approach  for  handling  the  noise  (which  keeps  the  magnitude  of  the 
noise  polynomial  in  A)  obviously  breaks  down. 

To  get  a  plaintext  space  isomorphic  to  7Lp  that  works  for  exponential  p,  we  need  a  new  approach.  Instead 
of  using  an  integer  modulus,  we  will  use  an  ideal  modulus  I  (an  ideal  of  II)  whose  norm  is  some  large  prime 
p,  but  such  that  we  have  a  basis  Bj  of  I  that  is  very  short  -  e.g.  \\Bi  ||  =  0(poly(7)  •  pl/d).  Using  an  ideal 
plaintext  space  forces  us  to  modify  the  modulus  switching  technique  nontrivially. 

Originally,  when  our  plaintext  space  was  R2,  each  of  the  moduli  in  our  “ladder”  was  odd  -  that  is,  they 
were  all  congruent  to  each  other  modulo  2  and  relatively  prime  to  2.  Similarly,  we  will  have  to  choose  each 
of  the  moduli  in  our  new  ladder  so  that  they  arc  all  congruent  to  each  other  modulo  I.  (This  just  seems 
necessary  to  get  the  scaling  to  work,  as  the  reader  will  see  shortly.)  This  presents  a  difficulty,  since  we 
wanted  the  norm  of  I  to  be  large  -  e.g.,  exponential  in  the  security  parameter.  If  we  choose  our  moduli  q3  to 
be  integers,  then  we  have  that  the  integer  qJ+\  —  qj  €  I  -  in  particular,  qJ+\  —  q:j  is  a  multiple  of  /’ s  norm, 
implying  that  the  qj’ s  are  exponential  in  the  security  parameter.  Having  such  large  qj’ s  does  not  work  well 
in  our  scheme,  since  the  underlying  lattice  problems  becomes  easy  when  qj/B  is  exponential  in  d  where 
B  is  a  bound  of  the  noise  distribution  of  fresh  ciphertexts,  and  since  we  need  B  to  remain  quite  small  for 
our  new  noise  management  approach  to  work  effectively.  So,  instead,  our  ladder  of  moduli  will  also  consist 
of  ideals  -  in  particular,  principle  ideals  ( qj )  generated  by  an  element  of  qj  G  R.  Specifically,  it  is  easy  to 
generate  a  ladder  of  qfs  that  arc  all  congruent  to  1  moduli  I  by  sampling  appropriately-sized  elements  qj 
of  the  coset  1  +  7  (using  our  short  basis  of  I),  and  testing  whether  the  principal  ideal  (qj)  generated  by  the 
element  has  appropriate  norm. 

Now,  let  us  reconsider  modulus  switching  in  light  of  the  fact  that  our  moduli  arc  now  principal  ideals. 
We  need  an  analogue  of  Lemma  4  that  works  for  ideal  moduli. 

Let  us  build  up  some  notation  and  concepts  that  we  will  need  in  our  new  lemma.  Let  Tq  be  the  half-open 
parallelepiped  associated  to  the  rotation  basis  of  q  G  R.  The  rotation  basis  Bf/  of  q  is  the  7-dimensional 
basis  formed  by  the  coefficient  vectors  of  the  polynomials  xlq(x)  mod  f(x)  for  i  G  [0,  d—  1] .  The  associated 
parallelepiped  is  Vq  =  zi  •  b*  :  b*  G  By,  Zi  G  [—1/2, 1/2)}.  We  need  two  concepts  associated  to  this 
parallelepiped.  First,  we  will  still  use  the  notation  \a]q,  but  where  q  is  now  an  /(-element  rather  than  integer. 
This  notation  refers  to  a  reduced  modulo  the  rotation  basis  of  a-i.e.,  the  element  \a\q  such  that  [a]q—a  G  qR 
and  [a]q  G  Vq.  Next,  we  need  notions  of  the  inner  radius  rqqn  and  outer  radius  rq,out  of  Vq  -  that  is,  the 
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largest  radius  of  a  ball  that  is  circumscribed  by  Vq,  and  the  smallest  radius  of  a  ball  that  circumscribes  Vq.  It 
is  possible  to  choose  q  so  that  the  ratio  r,hOV:t/rq.in  is  poly(d).  For  example,  this  is  true  when  q  is  an  integer. 

For  a  suitable  value  of  f(x )  that  determines  our  ring,  such  as  f(x)  =  xd  +  1,  the  expected  value  of  ratio 
will  be  poly(d)  even  if  q  is  sampled  uniformly  (e.g.,  according  to  discrete  Gaussian  distribution  centered  at 
0).  More  generally,  we  will  refer  to  rBjOUt  as  the  outer  radius  associated  to  the  parallelepiped  determined  by 
basis  B.  Also,  in  the  field  O(x)  /fix)  overlying  this  ring,  it  will  be  true  with  overwhelming  probability,  if  q 
is  sampled  uniformly,  that  ||q_1||  =  l/||g||  up  to  a  poly  (d)  factor.  For  convenience,  let  aid)  be  a  polynomial 
such  that  ||g_1 1|  =  1  / 1 1  q  1 1  up  to  a  aid)  factor  and  moreover  !i\l.0,,t/rqAn  is  at  most  add)  with  overwhelming 
probability.  For  such  an  a,  we  say  q  is  o-good.  Finally,  in  the  lemma,  xr  denotes  the  expansion  factor  of  It 

-  i.e.,  max{||a  •  b||/||a|| ||b||  :  a,  b  £  R}. 

Lemma  11.  Lei  q\  and  q2,  \\q\  |  <  1 1 f/2 1 1 ,  be  two  a- good  elements  of  R.  Lei  B /  be  a  short  basis  ( with  outer 
radius  rBu0ut)  of  an  ideal  I  of  R  such  that  q\—q2  £  L  Let  c  be  an  integer  vector  and  cl  G-  Sea  le(c,  q2,qi,L) 

-  that  is,  c!  is  an  R-element  at  most  2rBlt<mt  distant  from  (qi/q2)  •  c  such  that  c'  —  c  G  I.  Then,  for  any  s 
with 

II Kc> s)]<?2 1!  <  (rq2 ,in/a{d)2  -  (\\q2\\/\\qi\\)yR  ■  2rBl,out  ■  i[R\s)^  /(a(d)  ■  ^2R) 

we  have 

[<c',  s)]gi  =  [{c,s)]q2  mod  /  and  ||[(c',s)]91||  <  a(d)  ■  •  (||gi||/||g2||)  •  ||  [(c,  s)]?2 1|  +  yR  ■  2r^I)OUt  ■  i[R) {s) 

where  i[R\s)  is  defined  as  ||s[^]||- 
Proof  We  have 

[(C>S)]<J2  =  (c>s>  “  kQ2 

for  some  k  G  R.  For  the  same  k,  let 

eqi  =  (c',s)  -  kqi  <E  R 

Note  that  eqi  =  [(c',  s)]gi  mod  q\.  We  claim  that  ||e9l  ||  is  so  small  that  eqi  =  [(c',  s)]5l.  We  have: 

Kill  =  ||  -  kqi  +  ((gi/g2)  •  c,s)  +  <c'  -  (qx/qf)  ■  c,s)  || 

<  II  -  kqi  +  ((gi/?2)  •  c,  s)  ||  +  |j  (c7  -  (qi/q2)  •  c,  s)  || 

<  7 R  ■  ||?i/«2 II  '  ||[(c,s)]92||  +7fl  •  2rBl,out  '4R)(S) 

<7 R-  Ikill  •  Ik2_i||  •  ||[(c,s)]?2||  +  JR  ■  2rBl,out  ■  AR\S) 

<  a(d)  ■  7 r  ■  (||?i||/lk2||)  '  ||[<c,s)]52||  +7i? '  2 rBu0ut  ■  ^ (s) 

By  the  final  expression  above,  we  see  that  the  magnitude  of  eqi  may  actually  be  less  than  the  magnitude 
of  eq2  if  ||gi Il/H® ||  is  small  enough.  Let  us  continue  with  the  inequalities,  substituting  in  the  bound  for 

ll[(c,s)]q2||: 

\\eqi  II  <  a(rf) -lR-  (lki||/ll®ll)  •  (rq2tin/a(d)2  -  (||«2||/||?i||)7fl  ■  2rB/,0Ut  ■  4R)(S))  /(«(d)  '  7 r) 

+7 R  ■  ZrBliout  ■  i[R\s) 

<  (lki||/||®||)  ■  (rq2,in/ac(d)2  -  (\\q2\\/\\qi\\)lR  ■  2rBl,out  ■  AR) (si)  +  yR  ■  2rBlt0ut  ■  l[R\s) 

<  (rqi,in  -  7 R  ■  2 rBuOUt  ■  i[R\s)j  +  yR  ■  2 rBuOUt  ■  i[R) (s) 

—  fqi,in 
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Since  ||egi  ||  <  rqi  Ml,  eqi  is  inside  the  parallelepiped  Vqi  and  it  is  indeed  true  that  eqi  =  [(c',  s)]gi.  Further¬ 
more,  we  have  [(c',  s)]gi  =  eqi  =  (c' ,  s)  —  kq\  =  (c,  s)  —  kq2  =  [(c,  s)]92  mod  I.  □ 

The  bottom  line  is  that  we  can  apply  the  modulus  switching  technique  to  moduli  that  are  ideals,  and  this 
allows  us  to  use,  if  desired,  plaintext  spaces  that  are  very  large  (exponential  in  the  security  parameter)  and 
that  have  properties  that  are  often  desirable  (such  as  being  isomorphic  to  a  large  prime  field). 

5.5  Other  Optimizations 

If  one  is  willing  to  assume  circular  security,  the  keys  { s j  }  may  all  be  the  same,  thereby  permitting  a  public 
key  of  size  independent  of  L. 

While  it  is  not  necessary,  squashing  may  still  be  a  useful  optimization  in  practice,  as  it  can  be  used  to 
lower  the  depth  of  the  decryption  function,  thereby  reducing  the  size  of  the  largest  modulus  needed  in  the 
scheme,  which  may  improve  efficiency. 

For  the  LWE-based  scheme,  one  can  use  key  switching  to  gradually  reduce  the  dimension  nj  of  the 
ciphertext  (and  secret  key)  vectors  as  q3  decreases  -  that  is,  as  one  traverses  to  higher  levels  in  the  circuit. 
As  qj  decreases,  the  associated  LWE  problem  becomes  (we  believe)  progressively  harder  (for  a  fixed  noise 
distribution  %).  This  allows  one  to  gradually  reduce  the  dimension  rij  without  sacrificing  security,  and 
reduce  ciphertext  length  faster  (as  one  goes  higher  in  the  circuit)  than  one  could  simply  by  decreasing  qq 
alone. 

6  Summary  and  Future  Directions 

Our  RLWE-based  FHE  scheme  without  bootstrapping  requires  only  0(A  •  L3)  per-gate  computation  where  L 
is  the  depth  of  the  circuit  being  evaluated,  while  the  bootstrapped  version  has  only  0(A2)  per-gate  computa¬ 
tion.  For  circuits  of  width  0(A),  we  can  use  batching  to  reduce  the  per-gate  computation  of  the  bootstrapped 
version  by  another  factor  of  A. 

While  these  schemes  should  perform  significantly  better  than  previous  FHE  schemes,  we  caution  that  the 
polylogarithmic  factors  in  the  per-gate  computation  are  large.  One  future  direction  toward  a  truly  practical 
scheme  is  to  tighten  up  these  polylogarithmic  factors  considerably. 
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Targeted  Malleability: 

Homomorphic  Encryption  for  Restricted  Computations 

Dan  Boneh*  Gil  Segev*  Brent  Waters* 


Abstract 

We  put  forward  the  notion  of  targeted  malleability:  given  a  homomorphic  encryption  scheme, 
in  various  scenarios  we  would  like  to  restrict  the  homomorphic  computations  one  can  perform 
on  encrypted  data.  We  introduce  a  precise  framework,  generalizing  the  foundational  notion 
of  non-malleability  introduced  by  Dolev,  Dwork,  and  Naor  (SICOMP  ’00),  ensuring  that  the 
malleability  of  a  scheme  is  targeted  only  at  a  specific  set  of  “allowable”  functions. 

In  this  setting  we  are  mainly  interested  in  the  efficiency  of  such  schemes  as  a  function  of  the 
number  of  repeated  homomorphic  operations.  Whereas  constructing  a  scheme  whose  ciphertext 
grows  linearly  with  the  number  of  such  operations  is  straightforward,  obtaining  more  realistic 
(or  merely  non-trivial)  length  guarantees  is  significantly  more  challenging. 

We  present  two  constructions  that  transform  any  homomorphic  encryption  scheme  into  one 
that  offers  targeted  malleability.  Our  constructions  rely  on  standard  cryptographic  tools  and 
on  succinct  non-interactive  arguments,  which  are  currently  known  to  exist  in  the  standard 
model  based  on  variants  of  the  knowledge-of-exponent  assumption.  The  two  constructions  offer 
somewhat  different  efficiency  guarantees,  each  of  which  may  be  preferable  depending  on  the 
underlying  building  blocks. 


Keywords:  Homomorphic  encryption,  Non-malleable  encryption. 


‘Stanford  University.  Supported  by  NSF,  DARPA,  and  AFOSR. 
t Microsoft  Research,  Mountain  View,  CA  94043,  USA. 

^University  of  Texas  at  Austin.  Supported  by  NSF  CNS-0716199,  CNS-0915361,  and  CNS-0952692,  DARPA 
PROCEED,  Air  Force  Office  of  Scientific  Research  (AFO  SR)  MURI,  DHS  Grant  2006-CS-001-000001-02,  and  the 
Sloan  Foundation. 


3.  Targeted  Malleability 


1  Introduction 


Fully  homomorphic  encryption  [RAD78,  Gen09,  SV10,  vDGH+10]  is  a  remarkable  development  in 
cryptography  enabling  anyone  to  compute  arbitrary  functions  on  encrypted  data.  In  many  settings, 
however,  the  data  owner  may  wish  to  restrict  the  class  of  homomorphic  computations  to  a  certain 
set  T  of  allowable  functions.  In  this  paper  we  put  forward  the  notion  of  “targeted  malleability”: 
given  an  encryption  scheme  that  supports  homomorphic  operations  with  respect  to  some  set  of 
functions  J7,  we  would  like  to  ensure  that  the  malleability  of  the  scheme  is  targeted  only  at  the 
set  T .  That  is,  it  should  not  be  possible  to  apply  any  homomorphic  operation  other  than  the  ones 
in  T . 

Enforcing  targeted  malleability  can  be  simply  done  by  requiring  the  entity  performing  the  ho¬ 
momorphic  operation  to  embed  a  proof  in  the  ciphertext  showing  that  the  ciphertext  was  computed 
using  an  allowable  function.  The  decryptor  then  verifies  the  proof  before  decrypting  the  ciphertext, 
and  outputs  _L  if  the  proof  is  invalid.  Unfortunately,  as  the  homomorphic  operation  is  repeated  the 
number  of  proofs  grows  making  the  ciphertext  grow  at  least  linearly  with  the  number  of  repeated 
homomorphic  operations.  It  is  not  difficult  to  see  that  targeted  malleability  with  a  linear-size 
ciphertext  is  trivial  to  construct:  Use  any  non-malleable  encryption  scheme,  and  embed  in  the 
ciphertext  a  description  of  all  the  functions  being  computed.  The  decryptor  decrypts  the  original 
ciphertext  and  applies  the  embedded  functions  to  it  (verifying,  of  course,  that  these  functions  are 
in  the  allowable  set). 

Minimizing  ciphertext  expansion.  Targeted  malleability  is  much  harder  to  construct  once 
we  require  that  ciphertext  growth  is  at  most  sub-linear  in  the  number  of  repeated  homomorphic 
operations.  Our  goal  is  to  construct  systems  where  even  after  t  applications  of  the  homomorphic 
operation  the  ciphertext  length  does  not  increase  much.  In  our  main  construction  we  are  able  to 
completely  shift  the  dependence  on  t  from  the  ciphertext  to  the  public  key:  the  ciphertext  size  is 
essentially  independent  of  t.  This  is  a  natural  goal  since  public  keys  are  typically  much  more  static 
than  ciphertexts  which  are  frequently  generated  and  transmitted. 

Motivation.  While  targeted  malleability  is  an  interesting  concept  in  its  own  right,  it  has  many 
applications  in  cryptography  and  beyond.  We  give  a  few  illustrative  examples: 

•  A  spam  filter  implemented  in  a  mail  server  adds  a  spam  tag  to  encrypted  emails  whose  content 
satisfies  a  certain  spam  predicate.  The  filter  should  be  allowed  to  run  the  spam  predicate, 
but  should  not  modify  the  email  contents.  In  this  case,  the  set  of  allowable  functions  T  would 
be  the  set  of  allowable  spam  predicates  and  nothing  else.  As  email  passes  from  one  server  to 
the  next  each  server  homomorphically  computes  its  spam  predicate  on  the  encrypted  output 
of  the  previous  server.  Each  spam  filter  in  the  chain  can  run  its  chosen  spam  predicate  and 
nothing  else. 

•  More  generally,  in  a  distributed  system  users  initiate  encrypted  requests  to  various  servers. 
To  service  a  request  a  server  may  need  to  contact  another  server  and  that  server  may  need  to 
contact  another,  resulting  in  a  chain  of  messages  from  server  to  server  until  the  transaction  is 
fulhled.  Each  server  along  the  way  has  an  allowed  set  of  operations  it  can  apply  to  a  recieved 
message  and  it  should  be  unable  to  apply  any  operation  outside  this  approved  set. 

•  In  a  voting  system  based  on  homomorphic  encryption  (e.g.  [CGS97])  voters  take  turns  incre¬ 
menting  an  encrypted  vote  tally  using  a  homomorphic  operation.  They  are  only  allowed  to 
increase  the  encrypted  tally  by  1  (indicating  a  vote  for  the  candidate)  or  by  0  (indicating  a 
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no  vote  for  the  candidate).  In  elections  where  each  voter  votes  for  one  of  t  candidates,  voters 
modify  the  encrypted  tallies  by  adding  an  Gbit  vector,  where  exactly  one  entry  is  1  and  the 
rest  are  all  0’s.  They  should  be  unable  to  modify  the  counters  in  any  other  way. 

In  all  these  examples  there  is  a  need  to  repeatedly  apply  a  restricted  homomorphic  operation 
on  encrypted  data.  Limiting  ciphertext  expansion  is  highly  desirable. 

1.1  Our  Contributions 

We  begin  by  introducing  a  precise  framework  for  modeling  targeted  malleability.  Our  notion  of  se¬ 
curity  generalizes  the  foundational  one  of  non-malleability  due  to  Dolev,  Dwork,  and  Naor  [DDNOO], 
and  is  also  inspired  by  the  refinements  of  Bellare  and  Sahai  [BS99],  and  Pass,  Shelat,  and  Vaikun- 
tanathan  [PSV07].  Given  a  public- key  encryption  scheme  that  is  homomorphic  with  respect  to  a 
set  of  functions  J-  we  would  like  to  capture  the  following  intuitive  notion  of  security1:  For  any 
efficient  adversary  that  is  given  an  encryption  c  of  a  message  m  and  outputs  an  encryption  d  of  a 
message  m' ,  it  should  hold  that  either  (1)  m'  is  independent  of  m ,  (2)  d  =  c  (and  thus  m!  =  m),  or 
(3)  d  is  obtained  by  repeatedly  applying  the  homomorphic  evaluation  algorithm  on  c  using  func¬ 
tions  /i , . . . ,  f(  €  J7.  The  first  two  properties  are  the  standard  ones  for  non-malleable  encryption, 
and  the  third  property  captures  our  new  notion  of  targeted  malleability:  we  would  like  to  target 
the  malleability  of  the  scheme  only  at  the  class  J-  (we  note  that  by  setting  T  =  0  we  recover  the 
standard  definition  of  non-malleability)2.  We  consider  this  notion  of  security  with  respect  to  both 
chosen-plaintext  attacks  (CPA)  and  a-priori  chosen-ciphertext  attacks  (CCA1)3. 

We  emphasize  that  we  do  not  make  the  assumption  that  the  set  of  functions  J-  is  closed  under 
composition.  In  particular,  our  approach  is  sufficiently  general  to  allow  targeting  the  malleability  of 
a  scheme  at  any  subset  J-'  C  T  of  the  homomorphic  operations  that  are  supported  by  the  scheme. 
This  is  significant,  for  example,  when  dealing  with  fully  homomorphic  schemes,  where  any  set  of 
functions  is  in  fact  a  subset  of  the  supported  homomorphic  operations  (see  Section  1.3  for  more 
details). 

Next,  we  present  two  general  transformations  that  transform  any  homomorphic  encryption 
scheme  into  one  that  enjoys  targeted  malleability  for  a  limited  number  of  repeated  homomorphic 
operations.  The  resulting  schemes  are  secure  even  in  the  setting  of  a-priori  chosen-ciphertext 
attacks  (CCA1).  The  two  constructions  offer  rather  different  trade-offs  in  terms  of  efficiency.  In 
this  overview  we  focus  on  our  first  construction,  as  it  already  captures  the  main  ideas  underlying 
our  methodology. 

1.2  Overview  of  Our  Approach 

Our  approach  is  based  on  bridging  between  two  seemingly  conflicting  goals:  on  one  hand,  we  would 
like  to  turn  the  underlying  homomorphic  scheme  into  a  somewhat  non-malleable  one,  whereas  on 
the  other  hand  we  would  like  to  preserve  its  homomorphic  properties.  We  demonstrate  that  the 
Naor- Yung  “double  encryption”  paradigm  for  non-malleability  [NY90,  DDNOO,  Sah99,  Lin06]  can 
be  utilized  to  obtain  an  interesting  balance  between  these  two  goals.  The  structure  of  ciphertexts 
in  our  construction  follows  the  latter  paradigm:  a  ciphertext  is  a  3-tuple  (co,ci,7r)  containing  two 

1For  simplicity  we  focus  here  on  univariate  functions  and  refer  the  reader  to  Section  3  for  the  more  general  case 
of  multivariate  functions 

2We  assume  in  this  informal  discussion  that  the  adversary  outputs  a  valid  ciphertext,  but  our  notion  of  security 
in  fact  considers  the  more  general  case  -  see  Section  3. 

3  See  Section  6  for  a  discussion  on  a-posteriori  chosen-ciphertext  attacks  (CCA2)  in  the  setting  of  homomorphic 
encryption,  following  the  work  of  Prabhakaran  and  Rosulek  [PR08]. 
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encryptions  of  the  same  message  using  the  underlying  encryption  scheme  under  two  different  keys 
along  with  a  proof  it  that  the  ciphertext  is  well  formed.  For  ciphertexts  that  are  produced  by 
the  encryption  algorithm,  n  is  a  non-interactive  zero-knowledge  proof,  and  for  ciphertexts  that  are 
produced  by  the  homomorphic  evaluation  algorithm,  it  is  a  succinct  non-interactive  argument  that 
need  not  be  zero-knowledge. 

Specifically,  the  public  key  of  the  scheme  consists  of  two  public  keys,  pk o  and  pk\ ,  of  the 
underlying  homomorphic  scheme,  a  common  reference  string  for  a  non-interactive  zero-knowledge 
proof  system,  and  t  common  reference  strings  for  succinct  non-interactive  argument  systems  (where 
t  is  a  predetermined  upper  bound  on  the  number  of  repeated  homomorphic  operations  that  can 
be  applied  to  a  ciphertext  produced  by  the  encryption  algorithm).  The  secret  key  consists  of  the 
corresponding  secret  keys  sko  and  ski  ■  For  encrypting  a  message  we  encrypt  it  under  each  of  pko 
and  pki ,  and  provide  a  non-interactive  zero-knowledge  proof  that  the  resulting  two  ciphertexts  are 
indeed  encryptions  of  the  same  message.  Thus,  a  ciphertext  that  is  produced  by  the  encryption 
algorithm  has  the  form  (co,  ci,  7Tzk)- 

The  homomorphic  evaluation  algorithm  preserves  the  “double  encryption”  invariant.  Specifi¬ 
cally,  given  a  ciphertext  (co,  ci,  7Tzk)  that  was  produced  by  the  encryption  algorithm  and  a  function 
/  €  J-,  the  homomorphic  evaluation  algorithm  first  applies  the  homomorphic  evaluation  algo¬ 
rithm  of  the  underlying  encryption  scheme  to  each  of  co  and  c\.  That  is,  it  computes  = 
HomEvalpfc0(co,  /)  and  =  Horn  Eva  ^(ci,  /).  Then,  it  computes  a  succinct  non-interactive  argu¬ 
ment  7r^b  to  the  fact  that  there  exist  a  function  /  €  T  and  a  ciphertext  (co,  ci,  vtzk),  such  that  7Tzk 
is  accepted  by  the  verifier  of  the  non-interactive  zero-knowledge  proof  system,  and  that  c[4  and 
41}  are  generated  from  co  and  c\  using  /  as  specified.  We  denote  the  language  of  the  corresponding 
argument  system  by  L^l\  and  the  resulting  ciphertext  is  of  the  form  c^1'1  =  ^1,  Cq\  .  We 

point  out  that  the  usage  of  succinct  arguments  enables  us  to  prevent  the  length  of  ciphertexts  from 
increasing  significantly. 

More  generally,  given  a  ciphertext  of  the  form  c®  =  Cq \  cf* ,  ,  the  homomorphic  eval¬ 

uation  algorithm  follows  the  same  methodology  for  producing  a  ciphertext  of  the  same  form 
c(*+1)  =  (z  +  l,c£+1),4t+1),7r(<+1))  using  a  succinct  non-interactive  argument  system  for  a  lan¬ 
guage  L(,+1)  stating  that  there  exist  a  function  /  G  T  and  a  ciphertext  c®  that  is  well-formed  with 
respect  to  L^\  which  were  used  for  generating  the  current  ciphertext 

On  the  proof  of  security.  Given  an  adversary  that  breaks  the  targeted  malleability  of  our 
construction,  we  construct  an  adversary  that  breaks  the  security  of  (at  least  one  of)  the  underlying 
building  blocks.  As  in  [NY90,  DDNOO,  Sah99,  LinOG],  we  show  that  this  boils  down  to  having  a 
simulator  that  is  able  to  decrypt  a  ciphertext  while  having  access  to  only  one  of  the  secret  keys 
sko  and  ski  ■  This,  in  turn,  enables  the  simulator  to  attack  the  public  key  pkb  for  which  skb  is 
not  known,  where  b  £  {0, 1}.  For  satisfying  our  notion  of  security,  however,  such  a  simulator  will 
not  only  have  to  decrypt  a  ciphertext,  but  to  also  recover  a  “certification  chain”  demonstrating 
that  the  ciphertext  was  produced  by  repeatedly  applying  the  homomorphic  evaluation  algorithm. 
That  is,  given  a  well-formed  ciphertext  c®  =  Cq \  ,  the  simulator  needs  to  generate  a 

“certification  chain”  for  cW  of  the  form  (c^°\  f(°\  . . . ,  cP~lK  /(,-1),cW) ,  where: 

1.  is  an  output  of  the  encryption  algorithm,  which  can  be  decrypted  while  knowing  only  one 
of  sko  and  ski. 

2.  For  every  j  €  {1, ...,?'}  it  holds  that  c ^  is  obtained  by  applying  the  homomorphic  evaluation 
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algorithm  on  (Jj  ^  and  / ^  11. 


For  this  purpose,  we  require  that  the  argument  systems  used  in  the  construction  exhibit  the  follow¬ 
ing  “knowledge  extraction”  property:  for  every  efficient  malicious  prover  P*  there  exists  an  efficient 
“knowledge  extractor”  Extp*,  such  that  whenever  P*  outputs  a  statement  x  and  an  argument  n 
that  are  accepted  by  the  verifier,  Extp*  when  given  the  random  coins  of  P*  can  in  fact  produce  a 
witness  w  to  the  validity  of  x  with  all  but  a  negligible  probability. 

By  repeatedly  applying  such  extractors  the  simulator  is  able  to  produce  a  certification  chain. 
Then,  given  that  the  initial  ciphertext  is  well- formed  (i.e.,  the  same  messages  is  encrypted 
under  pko  and  pk\),  it  can  be  decrypted  using  only  one  of  the  corresponding  secret  keys. 

An  alternative  trade-off.  In  our  first  construction,  the  length  of  the  ciphertext  is  essentially 
independent  of  t,  and  the  public  key  consists  of  t  +  1  common  reference  strings.  In  our  second 
construction  the  number  of  common  reference  strings  in  the  public  key  is  only  logt,  and  a  cipher- 
text  now  consists  of  logt  ciphertexts  of  the  underlying  homomorphic  scheme  and  logt  succinct 
arguments.  Such  a  trade-off  may  be  preferable  over  the  one  offered  by  our  first  construction,  for 
example,  when  using  argument  systems  that  are  tailored  to  the  MV  languages  under  considera¬ 
tions,  or  when  it  is  not  possible  to  use  the  same  common  reference  string  for  all  argument  systems 
(depending,  of  course,  on  the  length  of  the  longest  common  reference  strings). 

The  main  idea  underlying  this  construction  is  that  the  arguments  computed  by  the  homomorphic 
evaluation  algorithm  form  a  tree  structure  instead  of  a  path  structure.  Specifically,  instead  of  using 
t  argument  systems,  we  use  only  d  =  log  t  argument  systems  where  the  z-th  one  is  used  for  arguing 
the  well-formedness  of  a  ciphertext  after  2*  repeated  homomorphic  operations. 

Succinct  extractable  arguments.  As  explained  above,  our  construction  hinges  on  the  exis¬ 
tence  of  succinct  non-interactive  argument  systems  that  exhibit  a  knowledge  extractor  capable  of 
extracting  a  witness  from  any  successful  prover.  Gentry  and  Wichs  [GW11]  recently  showed  that 
no  sub-linear  non-interactive  argument  system  can  be  proven  secure  by  a  black-box  reduction  to  a 
falsifiable  assumption.  Fortunately,  while  we  need  succinct  arguments,  their  lengths  need  not  be 
sub-linear.  It  suffices  for  our  purposes  that  arguments  are  shorter  by  only  a  multiplicative  constant 
factor  (say,  1/4)  than  the  length  of  the  witness,  and  therefore  the  negative  result  of  Gentry  and 
Wichs  does  not  apply  to  our  settings.  Nevertheless,  all  known  argument  systems  that  satisfy  our 
needs  are  either  set  in  the  random  oracle  model  or  are  based  on  non-falsifiable  assumptions  in  the 
sense  of  Naor  [Nao03]. 

The  first  such  system  was  constructed  by  Micali  [MicOO]  using  the  PCP  theorem.  Computational 
soundness  is  proved  in  the  random  oracle  model  [BR93]  and  the  length  of  the  proofs  is  essentially 
independent  of  the  length  of  the  witness.  Valiant  [Val08]  observed  that  the  system  is  extractable 
as  needed  for  our  proof  of  security.  Unfortunately,  we  inherently  cannot  use  an  argument  system 
set  in  the  random  oracle  model.  To  see  why,  consider  a  fresh  ciphertext  c  which  is  an  encryption 
of  message  m.  After  the  first  homomorphic  operation  we  obtain  a  new  ciphertext  d  containing  a 
proof  7 r  showing  that  c'  is  an  encryption  of  f(m)  for  some  allowable  function  /  €  T .  Verifying  7r 
requires  access  to  the  random  oracle.  Now,  consider  the  second  homomorphic  operation  resulting 
in  c" .  The  proof  embedded  in  c"  must  now  prove,  among  other  things,  that  there  exists  a  valid 
proof  7T  showing  that  d  is  a  well-formed  ciphertext.  But  since  7r’s  verifier  queries  the  random 
oracle,  this  statement  is  in  MV°  where  O  is  a  random  oracle.  Since  PCPs  do  not  relativize,  it 
seems  that  Micali’s  system  cannot  be  used  for  our  purpose.  In  fact,  there  are  no  known  succinct 
argument  systems  for  proving  statements  in  MV° .  This  issue  was  also  pointed  out  by  Chiesa 
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and  Tromer  [CT10]  in  a  completely  different  context,  who  suggested  to  overcome  this  difficulty  by 
providing  each  prover  with  a  smartcard  implementing  a  specific  oracle  functionality. 

Instead,  we  use  a  recent  succinct  non-interactive  argument  system  due  to  Groth  [GrolO]  (see 
also  the  refinement  by  Liprnaa  [Lip  11]).  Soundness  is  based  on  a  variant  of  the  “knowledge  of 
exponent  assumption,”  a  somewhat  non-standard  assumption  (essentially  stating  that  the  required 
extractor  exists  by  assumption,  backed  up  with  evidence  in  the  generic  group  model)  .  This  class  of 
assumptions  was  introduced  by  Damgard  [Dam91]  and  extended  by  Bellare  and  Palacio  [BP04b]. 
Interestingly,  Bellare  and  Palacio  [BP04a]  succeeded  in  falsifying  one  such  assumption  using  the  De¬ 
cision  Diffie-Hellman  problem.  We  note  that  while  Groth’s  argument  system  is  even  zero- knowledge, 
we  primarily  use  the  soundness  property  of  the  system  (see  the  discussion  in  Section  6  on  exploiting 
its  zero- knowledge  property). 

1.3  Related  Work 

The  problem  of  providing  certain  non-malleability  properties  for  homomorphic  encryption  schemes 
was  studied  by  Prabhakaran  and  Rosulek  [PR08].  As  a  positive  result,  they  presented  a  variant 
of  the  Cramer-Shoup  encryption  scheme  [CS98]  that  provably  supports  linear  operations  and  no 
other  operations.  There  are  two  main  differences  between  our  work  and  the  work  of  Prabhakaran 
and  Rosulek:  (1)  their  framework  only  considers  sets  of  allowable  functions  that  are  closed  un¬ 
der  composition ,  and  (2)  their  framework  does  not  prevent  ciphertext  expansion  during  repeated 
applications  of  the  homomorphic  operation,  whereas  this  is  a  key  goal  for  our  work. 

In  our  work  we  do  not  make  the  assumption  that  the  set  of  allowable  functions  T  is  closed 
under  composition.  As  already  discussed,  one  of  the  advantages  of  avoiding  this  assumption  (other 
than  the  obvious  advantage  of  capturing  a  wider  class  of  homomorphic  schemes)  is  that  we  are  in 
fact  able  to  target  the  malleability  of  a  scheme  at  any  subset  J-'  C  T  of  its  supported  homomorphic 
operations  (which  may  be  determined  by  the  specific  application  in  which  the  scheme  is  used) ,  and 
this  is  especially  significant  when  dealing  with  fully  homomorphic  schemes.  Another  advantage  is 
the  ability  to  limit  the  number  of  repeated  homomorphic  operations. 

We  note  that  when  assuming  that  the  set  of  functions  J-  is  closed  under  composition,  there 
is  in  fact  a  trivial  solution:  For  encrypting  a  message  m  compute  (Enc^m),  id)  using  any  non- 
malleable  encryption  scheme,  where  id  is  the  identity  function.  Then,  the  homomorphic  evaluation 
algorithm  on  input  a  ciphertext  (c,  /i)  and  a  function  /2  €  F  simply  outputs  (c,  f-2°fi)  (where  o 
denotes  composition  of  functions).  In  this  light,  Prabhakaran  and  Rosulek  focused  on  formalizing  a 
meaningful  notion  of  security  for  a-posteriori  chosen-ciphertext  attacks  (CCA2),  following  previous 
relaxations  of  such  attacks  [ADR02,  CKN03,  Gro04,  PR07].  This  is  orthogonal  to  our  setting  in 
which  the  issue  of  avoiding  a  blow-up  in  the  length  of  the  ciphertext  makes  the  problem  challenging 
already  for  chosen-plaintext  attacks. 

Finally,  we  note  that  targeted  malleability  shares  a  somewhat  similar  theme  with  the  problem  of 
outsourcing  a  computation  in  a  verifiable  manner  from  a  computationally-weak  client  to  a  powerful 
server  (see,  for  example,  [GKR08,  GGP10,  CKV10,  AIK10]  and  the  references  therein).  In  both 
settings  the  main  goal  from  the  security  aspect  is  to  guarantee  that  a  “correct”  or  an  “allowable” 
computation  is  performed.  From  the  efficiency  aspect,  however,  the  two  settings  significantly 
differ:  whereas  for  targeted  malleability  our  main  focus  is  to  prevent  a  blow-up  in  the  length  of 
the  ciphertext  resulting  from  repeated  applications  of  a  computation,  for  verifiable  computation  the 
main  focus  is  to  minimize  the  client’s  computational  effort  within  a  single  computation. 


5 

3.  Targeted  Malleability 


1.4  Paper  Organization 

The  remainder  of  this  paper  is  organized  as  follows.  In  Section  2  we  present  the  basic  tools  that 
are  used  in  our  constructions.  In  Section  3  we  formalize  the  notion  of  targeted  malleability.  In 
Sections  4  and  5  we  present  our  constructions.  Finally,  in  Section  6  we  discuss  possible  extensions 
of  our  work  and  several  open  problems. 

2  Preliminaries 

In  this  section  we  present  the  basic  tools  that  are  used  in  our  constructions:  public-key  encryp¬ 
tion  and  homomorphic  encryption,  succinct  non-interactive  arguments,  and  non-interactive  zero- 
knowledge  proofs. 

2.1  Public-Key  Encryption 

A  public-key  encryption  scheme  is  a  triplet  II  =  (KeyGen,  Enc,  Dec)  of  probabilistic  polynomial¬ 
time  algorithms,  where  KeyGen  is  the  key-generation  algorithm,  Enc  is  the  encryption  algorithm, 
and  Dec  is  the  decryption  algorithm.  The  key-generation  algorithm  KeyGen  receives  as  input  the 
security  parameter,  and  outputs  a  public  key  pk  and  a  secret  key  sk.  The  encryption  algorithm 
Enc  receives  as  input  a  public  key  pk  and  a  message  m,  and  outputs  a  ciphertext  c.  The  decryption 
algorithm  Dec  receives  as  input  a  ciphertext  c  and  a  secret  key  sk,  and  outputs  a  message  m  or  the 
symbol  _L. 

Functionality.  In  terms  of  functionality,  in  this  paper  we  require  the  property  of  almost-all-keys 
perfect  decryption  [DNR04],  defined  as  follows: 

Definition  2.1.  A  public-key  encryption  scheme  II  =  (KeyGen,  Enc,  Dec)  has  almost-all-keys  per¬ 
fect  decryption  if  there  exists  a  negligible  function  n(k)  such  that  for  all  sufficiently  large  k  with 
probability  1  —  v(k)  over  the  choice  of  ( sk,pk )  4—  KeyGen(lfc);  for  any  message  m  it  holds  that 

Pr  [Decsfc(Encpfc(m))  =  m\  =  1  , 

where  the  probability  is  taken  over  the  internal  randomness  of  Enc  and  Dec. 

We  note  that  Dwork,  Naor,  and  Reingold  [DNR04]  proposed  a  general  transformation  turning 
any  encryption  scheme  into  one  that  has  almost-all-keys  perfect  decryption.  When  starting  with 
a  scheme  that  has  a  very  low  error  probability,  their  transformation  only  changes  the  random 
bits  used  by  the  encryption  algorithm,  and  in  our  setting  this  is  important  as  it  preserves  the 
homomorphic  operations.  When  starting  with  a  scheme  that  has  a  significant  error  probability,  we 
note  that  the  error  probability  can  be  reduced  exponentially  by  encrypting  messages  under  several 
independently  chosen  public  keys,  and  decrypting  according  to  the  majority.  This  again  preserves 
the  homomorphic  operations. 

Security.  In  terms  of  security,  we  consider  the  most  basic  notion  of  semantic-security  against 
chosen-plaintext  attacks,  asking  that  any  efficient  adversary  has  only  a  negligible  advantage  in 
distinguishing  between  encryptions  of  different  messages.  This  is  formalized  as  follows: 

Definition  2.2.  A  public-key  encryption  scheme  II  =  (KeyGen,  Enc,  Dec)  is  semantically  secure 
against  chosen-plaintext  attacks  if  for  any  probabilistic  polynomial-time  adversary  A  =  (A\,  Af)  it 
holds  that 

Adv^(fc)  ^  |Pr  [Expt WXo(k)  =  l]  -  Pr  [Expt£™ ,(£;)  =  l] 
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is  negligible  in  k,  where  Exptf^b(fc)  is  defined  as  follows: 

1.  ( sk,pk )  KeyGen(lfc). 

2.  (mo,  mi,  state)  A\(lk,pk)  such  that  |mo|  =  \m\\. 

3.  c*  «-  Encpk(mb). 

4 ■  V  -t—  ^(c*,  state) 

5.  Output  b' . 

Homomorphic  encryption.  A  public-key  encryption  scheme  II  =  (KeyGen,  Enc,  Dec)  is  homo¬ 
morphic  with  respect  to  a  set  of  efficiently  computable  functions  T  if  there  exists  a  homomorphic 
evaluation  algorithm  Horn  Eva  I  that  receives  as  input  a  public  key  pk ,  an  encryption  of  a  message  m, 
and  a  function  /  £  J,  and  outputs  an  encryption  of  the  message  f(m).  Formally,  with  overwhelming 
probability  over  the  choice  of  ( sk,pk )  -t—  KeyGen(lfc)  (as  in  Definition  2.1),  for  any  ciphertext  c  such 
that  Decsfc(c)  _L  and  for  any  function  /  G  T  it  holds  that  Decsfc  (HomEvalpfc(c,  /))  =  /  (Decsfc(c)) 
with  probability  1  over  the  internal  randomness  of  HomEval  and  Dec. 

The  main  property  that  is  typically  required  from  a  homomorphic  encryption  scheme  is  compact¬ 
ness,  asking  that  the  length  of  the  ciphertext  does  not  trivially  grow  with  the  number  of  repeated 
homomorphic  operations.  In  our  setting,  given  an  upper  bound  t  on  the  number  of  repeated  ho¬ 
momorphic  operations  that  can  be  applied  to  a  ciphertext  produced  by  the  encryption  algorithm, 
we  are  interested  in  minimizing  the  dependency  of  the  length  of  the  ciphertext  on  t. 

An  additional  property,  that  we  do  not  consider  in  this  paper,  is  of  function  privacy.  Informally, 
this  property  asks  that  the  homomorphic  evaluation  algorithm  does  not  reveal  which  function  from 
the  set  J-  it  receives  as  input.  We  refer  the  reader  to  [GHV10]  for  a  formal  definition.  We  note 
that  in  our  setting,  where  function  privacy  is  not  taken  into  account,  we  can  assume  without  loss 
of  generality  that  the  homomorphic  evaluation  algorithm  is  deterministic. 

2.2  Non-Interactive  Extractable  Arguments 

A  non-interactive  argument  system  for  a  language  L  =  [Jke^L(k)  with  a  witness  relation  R  = 
U k&$R(k)  consists  of  a  triplet  of  algorithms  (CRSGen,  P,  V),  where  CRSGen  is  an  algorithm  gener¬ 
ating  a  common  reference  string  crs,  and  P  and  V  are  the  prover  and  verifier  algorithms,  respectively. 
The  prover  takes  as  input  a  triplet  (x,  w,  crs),  where  (x,w)  €  R,  and  outputs  an  argument  n.  The 
verifier  takes  as  input  a  triplet  (. x ,  7 r,  crs)  and  either  accepts  or  rejects.  In  this  paper  we  consider 
a  setting  where  all  three  algorithms  run  in  polynomial  time,  CRSGen  and  P  may  be  probabilistic, 
and  V  is  deterministic. 

We  require  three  properties  from  such  a  system.  The  first  property  is  perfect  completeness :  for 
every  (x,  w,  crs)  such  that  (x,  w)  €  R,  the  prover  always  generates  an  argument  that  is  accepted  by 
the  verifier.  The  second  property  is  knowledge  extraction:  for  every  efficient  malicious  prover  P* 
there  exists  an  efficient  “knowledge  extractor”  Extp* ,  such  that  whenever  P*  outputs  (x,  7 r)  that  is 
accepted  by  the  verifier,  Extp*  when  given  the  random  coins  of  P*  can  in  fact  produce  a  witness  w 
such  that  (x,  w)  €  R  with  all  but  a  negligible  probability.  We  note  that  this  implies,  in  particular, 
soundness  against  efficient  pr overs. 

The  perfect  completeness  and  knowledge  extraction  properties  are  in  fact  trivial  to  satisfy:  the 
prover  can  output  the  witness  w  as  an  argument,  and  the  verifier  checks  that  (x,  w)  €  R  (unlike  for 
CS  proofs  [MicOO,  Val08]  we  do  not  impose  any  non-trivial  efficiency  requirement  on  the  verifier). 
The  third  property  that  we  require  from  the  argument  system  is  that  of  having  rather  succinct 
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arguments:  there  should  exist  a  constant  0  <  7  <  1  such  that  the  arguments  are  of  length  at  most 
7 \w\. 


Definition  2.3.  Let  0  <  7  <  1  be  a  constant.  A  7-succinct  non-interactive  extractable  argument 
system  for  a  language  L  =  (JfcgN  with  a  witness  relation  Rl  =  UfcgN  -^L(fc)  a  triplet  of 
probabilistic  polynomial-time  algorithms  (CRSGen,  P,  V)  with  the  following  properties: 


1.  Perfect  completeness:  For  every  k  G  N  and  (x,w)  G  Rl(U)  it  holds  that 


Pr 


V(lfc,x,7r,crs)  =  1 


crs  4—  CRSGen(lfc)  1 

7 r  g-  P(lfc,  x,  w,  crs) 


=  1 


where  the  probability  is  taken  over  the  internal  randomness  of  CRSGen,  P  and  V. 


2.  Adaptive  knowledge  extraction:  For  every  probabilistic  polynomial-time  algorithm  P* 
there  exist  a  probabilistic  polynomial-time  algorithm  Extp*  and  a  negligible  function  u(-)  such 
that 


Pr 

(x,w)  Rm  and  V(lfc,  x,  ir,  crs)  =  1 

crs  <—  CRSGen(lfc),  r  G-  {0, 1}* 
(x,  7r)  G-  P*(lfc,  crs;  r) 

w  G-  Extp*  (lfc,  crs,  r) 

for  all  sufficiently  large  k,  where  the  probability  is  taken  over  the  internal  randomness  of 

CRSGen,  P*,  V,  and  Extp*. 

3.  7-Succinct  arguments:  For  every  k  G  N,  (x,w)  G  Ruu)  and  crs  G  {0,1}*,  it  holds  that 
P(lfc,  x,  w,  crs)  produces  a  distribution  over  strings  of  length  at  most  'j\w\. 


Instantiation.  An  argument  system  satisfying  Definition  2.3  (with  a  deterministic  verifier)  was 
recently  constructed  by  Groth  [GrolO]  in  the  common-reference  string  model  based  on  a  certain 
“knowledge  of  exponent”  assumption.  His  scheme  is  even  zero-knowledge,  and  the  length  of  the 
resulting  arguments  is  essentially  independent  of  the  length  of  the  witness.  The  length  of  the 
common-reference  string,  however,  is  at  least  quadratic  in  the  length  of  the  witness4,  and  this  will 
limit  our  constructions  to  support  only  a  constant  number  of  repeated  homomorphic  operations. 
Any  argument  system  satisfying  Definition  2.3  with  a  common-reference  string  of  length  linear  in 
the  length  of  the  witness  will  allow  our  first  construction  to  support  any  logarithmic  number  of 
repeated  homomorphic  operations,  and  our  second  construction  to  support  any  polynomial  number 
of  such  operations. 


The  running  time  of  the  knowledge  extractor.  The  proofs  of  security  of  our  constructions 
involve  nested  invocations  of  the  knowledge  extractors  that  are  provided  by  Definition  2.3.  When 
supporting  only  a  constant  number  of  repeated  homomorphic  operations  the  simulation  will  always 
run  in  polynomial  time.  When  supporting  a  super-constant  number  of  repeated  homomorphic  op¬ 
erations,  we  need  to  require  that  the  knowledge  extractor  Extp*  corresponding  to  a  malicious  prover 
P*  runs  in  time  that  is  linear  in  the  running  time  of  P*.  This  (together  with  a  common-reference 
string  of  linear  length)  will  allow  our  first  construction  to  support  any  logarithmic  number  of  re¬ 
peated  homomorphic  operations,  and  our  second  construction  to  support  any  polynomial  number 
of  such  operations. 

4For  proving  the  satisfiability  of  a  circuit  of  size  s,  the  common-reference  string  in  [GrolO]  consists  of  0(s 2)  group 
elements,  taken  from  a  group  where  (in  particular)  the  discrete  logarithm  problem  is  assumed  to  be  hard.  Lipmaa 
[Lipll]  was  able  to  slightly  reduce  the  number  of  group  elements,  but  even  in  his  construction  it  is  still  super-linear. 
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2.3  Non-Interactive  Simulation-Sound  Adaptive  Zero-Knowledge  Proofs 


We  define  the  notion  of  a  non-interactive  simulation-sound  adaptive  zero-knowledge  proof  system 
[BFM88,  FLS90,  BSM+91,  Sah99]. 

Definition  2.4.  A  non-interactive  simulation-sound  adaptive  zero-knowledge  proof  system  for 
a  language  L  =  (J k£f^L(k)  with  a  witness  relation  Rl  =  UkGN  ^L(k)  a  ^uP^e  of  probabilistic 
polynomial-time  algorithms  II  =  (CRSGen,  P,  V,Si,S2)  with  the  following  properties: 

1.  Perfect  completeness:  For  every  fcgN  and  (x,w)  €  Rl^)  it  holds  that 


Pr  V(lfc,x,7r,crs)  =  1 


crs  4—  CRSGen(lfc) 

7 r  -t—  P(lfc,x,  ic,  crs)  J 


where  the  probability  is  taken  over  the  internal  randomness  of  CRSGen,  P  and  V. 


2.  Adaptive  soundness:  For  every  algorithm  P*  there  exists  a  negligible  function  •)  such 


that 


Pr  x  L(k )  and  V(lfc,  x,  n,  crs)  =  1 


crs  CRSGen(lfc)  1  ,,, 

(x,  7r)  -t—  P*(lfc,  crs)  —  ^ 


for  all  sufficiently  large  k,  where  the  probability  is  taken  over  the  internal  randomness  of 

CRSGen,  P*,  and  V. 


3.  Adaptive  zero  knowledge:  For  every  probabilistic  polynomial-time  algorithm  A  there  exists 
a  negligible  function  v(-)  such  that 

Advn KA(k)=  Pr  ExPtnKA(^’)  =  1  -Pr  Expt^KA;SliS2(A:)  =  1  <  v(k) 

for  all  sufficiently  large  k,  where  the  experiment  Exptf1Kj4(A:)  is  defined  as: 

(a)  crs  -t—  CRSGen(lfc) 

(b)  b±-  Ap(lfc’-’-’crs)(lfc,crs) 

(c)  Output  b 

and  the  experiment  Expt^K4  S[  Sz  (. k )  is  defined  as: 

(a)  (crs,  r)  Si(lfc) 

(b)  b  -t—  As2p,'’'’'’T)(lfc)  crs),  where  S'2(lfc,  x,  w,  r)  =  S2(lfc,  x,  r) 

(c)  output  b 


4 ■  Simulation  soundness:  For  every  probabilistic  polynomial-time  algorithm  A  there  exists  a 
negligible  function  u{-)  such  that 

AdvrfUW  =f  Pr  [ExPtnS/i(fc)  =  1  <  v{k) 

for  all  sufficiently  large  k,  where  the  experiment  Exptf1Sj4(A:)  is  defined  as: 

(a)  (crs, r)  Si(lfc) 

(b)  (x,7r)  <—  AS2(lk’’’T)(lk,  crs) 

(c)  Denote  by  Q  the  set  0/S2 ’s  answers  to  A’s  oracle  queries 

(d)  Output  1  if  and  only  if  x  ^  L(k),  ir  ^  Q,  and  V(lfc,  x,  n,  crs)  =  1 
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3  Defining  Targeted  Malleability 

In  this  section  we  introduce  a  framework  for  targeted  malleability  by  formalizing  non-malleability 
with  respect  to  a  specific  set  of  functions.  We  begin  by  discussing  the  case  of  univariate  functions, 
and  then  show  that  our  approach  naturally  generalizes  to  the  case  of  multivariate  functions.  Given 
an  encryption  scheme  that  is  homomorphic  with  respect  to  a  set  of  functions  J-  we  would  like  to 
capture  the  following  notion  of  security:  For  any  efficient  adversary  that  is  given  an  encryption  c 
of  a  message  m  and  outputs  an  encryption  d  of  a  message  rri' ,  it  should  hold  that  either  (1)  m!  is 
independent  of  m,  (2)  d  =  c  (and  thus  rn'  =  rn),  or  (3)  d  is  obtained  by  repeatedly  applying  the 
homomorphic  evaluation  algorithm  on  c  using  functions  f(  €  T .  The  first  two  properties 

are  the  standard  ones  for  non-malleability  [DDNOO],  and  the  third  property  captures  targeted 
malleability. 

Following  [DDNOO,  BS99,  PSV07]  we  formalize  a  simulation-based  notion  of  security  that  com¬ 
pares  a  real-world  adversary  to  a  simulator  that  is  not  given  any  ciphertexts  as  input.  Specifically, 
we  consider  two  experiments:  a  real-world  experiment,  and  a  simulated  experiment,  and  require 
that  for  any  efficient  real-world  adversary  there  exists  an  efficient  simulator  such  that  the  outputs 
of  the  two  experiments  are  computationally  indistinguishable5.  We  consider  both  chosen-plaintext 
attacks  (CPA)  and  a-priori  chosen-ciphertext  attacks  (CCA1).  We  assume  that  the  set  of  functions 
J-  is  recognizable  in  polynomial  time,  and  it  may  or  may  not  be  closed  under  composition. 

Chosen-plaintext  attacks  (CPA).  In  the  real-world  experiment  we  consider  adversaries  that 
are  described  by  two  algorithms  A  =  (Ai,  A2).  The  algorithm  Ai  takes  as  input  the  public  key  of 
the  scheme,  and  outputs  a  description  of  a  distribution  Ai  over  messages,  a  state  information  statei 
to  be  included  in  the  output  of  the  experiment,  and  a  state  information  state2  to  be  given  as  input 
to  the  algorithm  A^.  We  note  that  statei  and  state2  may  include  pk  and  Ai.  Then,  the  algorithm 
A 2  takes  as  input  the  state  information  state2  and  a  sequence  of  ciphertexts  that  are  encryptions 
of  messages  m\ , ,  mr  sampled  from  Ai .  The  algorithm  A2  outputs  a  sequence  of  ciphertexts 
ci, . . . ,  cq,  and  the  output  of  the  experiment  is  defined  as  (statei,  mi,  •  •  • ,  mr,  c?i, . . . ,  dq),  where  for 
every  j  €  {1, . . . ,  q}  the  value  dj  is  one  of  two  things:  if  Cj  is  equal  to  the  z-th  input  ciphertext  for 
some  i  then  dj  is  a  special  symbol  copy^;  otherwise  dj  is  the  decryption  of  Cj. 

In  the  simulated  experiment  the  simulator  is  also  described  by  two  algorithms  S  =  (Si,  52). 
The  algorithm  5i  takes  as  input  the  public  key,  and  outputs  a  description  of  a  distribution  Ai  over 
messages,  a  state  information  statei  to  be  included  in  the  output  of  the  experiment,  and  a  state 
information  state2  to  be  given  as  input  to  the  algorithm  S2  (as  in  the  real  world).  Then,  a  sequence 
of  messages  is  sampled  from  Ai,  but  here  the  algorithm  S2  does  not  receive  the  encryptions  of 
these  messages,  but  only  state2.  The  algorithm  S2  should  output  q  values,  where  each  value  can 
take  one  of  three  possible  types.  The  first  type  is  the  special  symbol  copy,;,  and  in  this  case  we 
define  dj  =  copy,;.  This  captures  the  ability  of  real-world  adversary  to  copy  one  of  the  ciphertexts. 
The  second  type  is  an  index  i  €  { 1 , . . . ,  r}  and  a  sequence  of  functions  f  \ , . . . ,  fi  €  T ,  where  l  is  at 
most  some  predetermined  upper  bound  t  on  the  number  of  repeated  homomorphic  operations.  In 
this  case  we  define  dj  =  firnf)  where  /  =  f\  o  ■  ■  ■  o  fy.  This  captures  the  ability  of  the  real-world 
adversary  to  choose  one  of  its  input  ciphertexts  and  apply  the  homomorphic  evaluation  algorithm 
for  at  most  t  times.  The  third  type  is  a  ciphertext  Cj,  and  in  this  case  dj  is  defined  as  its  decryption. 
As  the  simulator  does  not  receive  any  ciphertexts  as  input,  this  captures  the  ability  of  the  adversary 
to  produce  a  ciphertext  that  is  independent  of  its  input  ciphertexts.  The  output  of  the  experiment 

5 As  commented  by  Pass  et  al.  [PSV07],  note  that  a  distinguisher  between  the  two  experiments  corresponds  to 
using  a  relation  for  capturing  non- malleability  as  in  [DDNOO,  BS99]. 
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is  defined  as  (statei,  mi, . . . ,  mr,  d\, . . . ,  dq). 

Definition  3.1.  Let  t  =  t(k)  be  a  polynomial.  A  public-key  encryption  scheme  II  =  (KeyGen,  Enc, 
Dec,  HomEval)  is  Abounded  non-malleable  against  chosen-plaintext  attacks  with  respect  to  a  set  of 
functions  F  if  for  any  polynomials  r  =  r(k)  and  q  =  q(k )  and  for  any  probabilistic  polynomial-time 
algorithm  A  =  A2)  there  exists  a  probabilistic  polynomial-time  algorithm  S  =  (Si,  S2)  such  that 

the  distributions  { Realf/^ ,t,r,q(^)} ke^  an d  {Sirring  (see  Figure  1)  are  computationally 

indistinguishable. 


RealnPA,t,r,9(^): 

S mYl,S,t,r,q(k )'■ 

1. 

( sk,pk )  4—  KeyGen(lfc) 

1. 

(■ sk,pk )  -t—  KeyGen (lfc) 

(M,  statei,  state2)  Ai(\k,pk) 

2. 

(At, statei , stateo)  4—  S'i(lfe,nfc) 

2. 

3. 

(mi,.. 

. ,  mr)  4—  At 

3. 

(mi, . . . ,  mr)  4—  A4 

4. 

(Cl,... 

,  cq)  4-  ^(l* 

;,state2) 

4. 

c*  4-  En cpk(mz)  for  every  i  G  {1, . . . ,  r} 

5. 

For  every  j  £  { 1 , . . 

. ,  q}  let 

5. 

(ci,...,c9)  4-  A2(ik ,  cl, ... ,  c*,  state2) 

copy, 

if  Cj  =  copyi 

6. 

For  every  j  £  {1, ...  ,q}  let 

if  °j  =  (L  fu  ■  ■  ■ ,  fe) 

dj  =  < 

f(mi) 

where  i  £  {1, . . . ,  r}, 

d  =[  c°py i  if  ci  =  c* 

J  |  Decsfc(cj)  otherwise 

t<t,fi,...,fe£F, 
and  /  =  fi  0  ■  ■  ■  0  fe 

Decsi;  (cj ) 

otherwise 

7. 

Output  (statei,  mi,  •  •  • ,  mr,  d\, . . . ,  dq) 

6. 

Output  (statei, mi, 

.  .  .  ,  TYlr ,  dl,  .  .  .  ,  dq ) 

Figure  1:  The  distributions  RealnP^.tir.jg(fc)  and  S\m(fgtrq(k). 


Dealing  with  multivariate  functions.  Our  approach  naturally  generalizes  to  the  case  of  mul¬ 
tivariate  functions  as  follows.  Fix  a  set  F  of  functions  that  are  defined  on  d-tuples  of  plaintexts  for 
some  integer  d,  and  let  A  be  an  efficient  adversary  that  is  given  a  sequence  of  ciphertexts  c\ , . . . ,  c* 
and  outputs  a  sequence  of  ciphertexts  c\, ...  ,cq,  as  in  Definition  3.1.  Intuitively,  for  each  output 
ciphertext  Cj  it  should  hold  that  either  (1)  Decsk(cj)  is  independent  of  c*,...,c*,  (2)  Cj  =  c*  for 
some  i  €  {1, . . . ,  r},  or  (3)  Cj  is  obtained  by  repeatedly  applying  the  homomorphic  evaluation  algo¬ 
rithm  using  functions  from  the  set  F  and  a  sequence  of  ciphertexts  where  each  ciphertext  is  either 
taken  from  or  is  independent  of  c*, . . . ,  c*. 

Formally,  the  distribution  Real^P4  ^r(J(/c)  is  not  modified,  and  the  distribution  Sim u^s,t,r,q(^)  is 
modified  by  only  changing  the  output  Cj  =  (i,  f\ , . . .  ,ff)  of  S2  to  a  d-ary  tree  of  depth  at  most 
t:  each  internal  node  contains  a  description  of  a  function  from  the  set  F ,  and  each  leaf  contains 
either  an  index  i  €  {1, . . .  ,r}  or  a  plaintext  m.  The  corresponding  value  dj  is  then  computed  by 
evaluating  the  tree  bottom-up  where  each  index  i  is  replaced  by  the  plaintext  mj  that  was  sampled 
from  A4. 

Dealing  with  randomized  functions.  The  above  definitions  assume  that  F  is  a  set  of  determin¬ 
istic  functions.  More  generally,  one  can  also  consider  randomized  functions.  There  are  two  natural 
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approaches  for  extending  our  framework  to  this  setting.  The  first  approach  is  to  view  each  function 
/  €  F  and  string  r  €  {0,1}*  (of  an  appropriate  length)  as  defining  a  function  fr(m )  =  f(m;r), 
and  to  apply  the  above  definitions  to  the  set  F'  =  {/r}/e.F,re{ o.i}*  °f  deterministic  functions.  The 
second  approach  is  to  modify  the  distribution  S\m^gtrq(k)  as  follows:  instead  of  setting  dj  to  the 
value  /(ru.j),  we  sample  dj  from  the  distribution  induced  by  the  random  variable  /(m,,).  Each  of 
these  two  approaches  may  be  preferable  depending  on  the  context  in  which  the  encryption  scheme 
is  used,  and  for  simplifying  the  presentation  in  this  paper  we  assume  that  F  is  a  set  of  deterministic 
functions. 

A-priori  chosen-ciphertexts  attacks  (CCA1).  Definition  3.1  generalizes  to  a-priori  chosen- 
ciphertext  attacks  by  providing  the  algorithm  A\  oracle  access  to  the  decryption  oracle  before 
choosing  the  distribution  Ai.  At  the  same  time,  however,  the  simulator  still  needs  to  specify 
the  distribution  At  without  having  such  access  (this  is  also  known  as  non-assisted  simulation ). 
Specifically,  we  define  and  {Sim n^s),r,q(.k)}keN  as  follows:  Realn^  (fc)  is 

obtained  from  RealJ}^^,,  (fc)  by  providing  A\  with  oracle  access  to  Decsfe(-),  and  S\m^glrq(k)  is 
identical  to  S\mf^gtr  q(k). 

Definition  3.2.  Let  t  =  t(k )  be  a  polynomial.  A  public-key  encryption  scheme  II  =  (KeyGen,  Enc, 
Dec,  HomEval)  is  t-bounded  non-malleable  against  a-priori  chosen-ciphertext  attacks  with  respect 
to  a  set  of  functions  F  if  for  any  polynomials  r  =  r(k)  and  q  =  q(k)  and  for  any  probabilistic 
polynomial-time  algorithm  A  =  (Ai,  A2)  there  exists  a  probabilistic  polynomial-time  algorithm  S  = 
(Si,  S2)  such  that  the  distributions  {Real{^^rj(?(£;)}fceN  and  {Simfi  ^^(A:)}*;^  are  computationally 
indistinguishable. 

4  The  Path-Based  Construction 

In  this  section  we  present  our  first  construction.  The  construction  is  based  on  any  public-key 
encryption  scheme  that  is  homomorphic  with  respect  to  some  set  F  of  functions,  a  non-interactive 
zero- knowledge  proof  system,  and  7-succinct  non-interactive  argument  systems  for  7  =  1/4.  The 
scheme  is  parameterized  by  an  upper  bound  t  on  the  number  of  repeated  homomorphic  operations 
that  can  be  applied  to  a  ciphertext  produced  by  the  encryption  algorithm.  The  scheme  enjoys  the 
feature  that  the  dependency  on  t  is  essentially  eliminated  from  the  length  of  the  ciphertext,  and 
shifted  to  the  public  key.  The  public  key  consists  of  t  +  1  common  reference  strings:  one  for  the 
zero-knowledge  proof  system,  and  t  for  the  succinct  argument  systems.  We  note  that  in  various 
cases  (such  as  argument  systems  in  the  common  random  string  model)  it  may  be  possible  to  use 
only  one  common-reference  string  for  all  t  argument  systems,  and  then  the  length  of  the  public  key 
decreases  quite  significantly. 

In  Section  4.1  we  formally  specify  the  building  blocks  of  the  scheme,  and  in  Section  4.2  we 
provide  a  description  of  the  scheme.  In  Section  4.3  we  prove  the  security  of  the  scheme  against 
chosen-plaintexts  attacks  (CPA),  and  in  Section  4.4  we  show  tat  the  proof  in  fact  extends  to  deal 
with  a-priori  chosen-ciphertext  attacks  (CCA1). 

4.1  The  Building  Blocks 

Our  construction  relies  on  the  following  building  blocks: 

1.  A  homomorphic  public-key  encryption  scheme  II  =  (KeyGen,  Enc,  Dec,  HomEval)  with  respect 
to  an  efficiently  recognizable  set  of  efficiently  computable  functions  F.  We  assume  that  the 
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scheme  has  almost-all-keys  perfect  decryption  (see  Definition  2.1).  In  addition,  as  discussed 
in  Section  2.1,  as  we  do  not  consider  function  privacy  we  assume  without  loss  of  generality 
that  Horn  Eva  I  is  deterministic. 

For  any  security  parameter  k  G  N  we  denote  by  ipk  =  £pk(k),  £ m  =  £m(k),  £r  =  £r(k),  and 
£c  =  £c{k)  the  bit-lengths  of  the  public  key,  plaintext,  randomness  of  Enc,  and  ciphertext6, 
respectively,  for  the  scheme  II.  In  addition,  we  denote  by  Vj-  the  deterministic  polynomial¬ 
time  algorithm  for  testing  membership  in  the  set  J7,  and  denote  by  tj  =  £p{k)  the  bit-length 
of  the  description  of  each  function  /  E  J. 

2.  A  non-interactive  deterministic- verifier  simulation-sound  adaptive  zero-knowledge  proof  sys¬ 
tem  (see  Section  2.3)  n(°)  =  ^CRSGerA0),  p(°),  for  the  NV -language  =  UfcgN  L^°\k) 

defined  as  follows. 

3(m,  ro,ri)  G  {0,  l}£m+2^  s.t. 

Lw(k)  =  <  (pA:o,pfci,cS0),cS0))  €  {0,  l}2^fc+24  :  c[,0)  =  Encpfco(m;  r0) 

and  40)  =  EnCpfcj  (?n;  ri) 

For  any  security  parameter  fcERwe  denote  by  £crs(o)  =  £crs(o  )(k)  and  £n(  o)  =  £n(o  )(k)  the  bit- 
lengths  of  the  common-reference  strings  produced  by  CRSGen^1^  and  of  the  proofs  produced 
by  P(°),  respectively.  Without  loss  of  generality  we  assume  that  £n( o)  >  max  {£Cl  (as 
otherwise  proofs  can  always  be  padded). 

3.  For  every  i  €  {l,...,t}  a  1/4-succinct  non-interactive  deterministic- verifier  extractable  ar¬ 
gument  system  (see  Section  2.2)  IlW  =  ^CRSGen^,  pW,  for  the  A/’T’-language  = 
UfceN  {k)  defined  as  follows. 

(pko,pki,  Cq\  c±\  crs^-1\  . . . ,  crs*^  G  {0,  1  }2V+2A+Ej=(AcrsO')  . 

3  cf-1),  ttC*-1),  /)  G  {0,l}24+^-i)+^  s.t. 

•  Vjr  (/)  =  1 

•  c(0^  =  HomEvalpfco 

•  =  HomEvalpfel 

•  V*-*-1-1  ^pko,pki,  Cq_1\  Ci  crs^_2\ . . . ,  crs^0^  ,7r^-1\  crs^-1^  =1 

For  any  security  parameter  k  G  N  we  denote  by  £crs(i)  =  £crs^(k)  and  l^i)  =  £^){k)  the 
bit-lengths  of  the  common-reference  strings  produced  by  CRSGen^  and  of  the  arguments 
produced  by  PW,  respectively. 

4.2  The  Scheme 

The  scheme  II'  =  (KeyGen7,  Enc7,  Dec7,  HomEval7)  is  parameterized  by  an  upper  bound  t  on  the 
number  of  repeated  homomorphic  operations  that  can  be  applied  to  a  ciphertext  produced  by  the 
encryption  algorithm.  The  scheme  is  defined  as  follows: 

6  For  simplicity  we  assume  a  fixed  upper  bound  lc  on  the  length  of  ciphertexts,  but  this  is  not  essential  to 
our  construction.  More  generally,  one  can  allow  the  length  of  ciphertexts  to  increase  as  a  result  of  applying  the 
homomorphic  operation. 
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•  Key  generation:  On  input  lk  sample  two  pairs  of  keys  ( sko,pko )  -t—  KeyGen(lfc)  and 
(ski,pk\)  -t—  KeyGen(lfc).  Then,  for  every  z  €  {0, .  ..,£}  sample  crs^  -t—  CRSGenW(lfc). 
Output  the  secret  key  sk  =  ( sko ,  ski)  and  the  public  key  pk  =  (pko,pki,  crs^0), . . . ,  crs^). 

•  Encryption:  On  input  a  public  key  pk  and  a  plaintext  m,  sample  ?’o,ri  €  {0, 1}*  uniformly 
at  random,  and  output  the  ciphertext  =  4),  Cq°\  cj°\  zr^Y  where 


ci0)  =  Encpfc0(m;ro)  , 
c?  =  EnCp^  (m;  r\)  , 


vr(0)  p(0)  ^pk0jpki,C^ 


Homomorphic  evaluation:  On  input  a  public  key  pk,  a  ciphertext  ^z,  Cq \  c±\  zr^J ,  and 
a  function  /  €  F,  proceed  as  follows.  If  i  ^  {0, . . . ,  t  —  1}  or 

(Jpko,pki,  Cq\  Cj*\  crs*-*-1), . . . ,  crs*-0^  ,  z4\c|Y*^  =0 
then  output  _L.  Otherwise,  output  the  ciphertext  c^+1^  =  ^z  +  1,  Cq +1\  c^+1\  z4+1)^  ,  where 


„(*+i)  _ 


=  Horn  Eva  lpfc0 
=  HomEvalpfcl  (c[l\ 


,(*+i) 


7j-(*+l)  ^ _  p(*+l) 


P*h,c£+1),cJl+1\crs(Y---,creluJ)  ,  (c^,c^,7Tw,/)  ,  crs 


(0) 


„»  _(*)  _(i) 


s(i+i) 


Decryption:  On  input  a  secret  key  sk  and  a  ciphertext  ^z,  Cq \  c^\  zrW^ ,  output  _L  if  z  ^ 
{0, • • • ,t}  or 


v« 


,pfci,cP,cP,crs(l  1] 


Otherwise,  compute  mo  =  Decsfc0  ( )  and  rri\  =  Dec,^  ( c\' 
and  otherwise  output  mo- 


, . . . ,  crs 


(0) 


,7rW)CrsW)  =0 


W 


If  mo  /  m±  then  output  _L, 


Note  that  at  any  point  in  time  a  ciphertext  of  the  scheme  is  of  the  form  ^z,  Cq \  cf  \  zr®^ ,  where 

%  €  {0, . . .  ,t},  c'q  and  c'p  are  ciphertexts  of  the  underlying  encryption  scheme,  and  zr^  is  a  proof 
or  an  argument  with  respect  to  one  of  lh°), . . . ,  144  Note  that  the  assumption  that  the  argument 
systems  11^  1\  . . . ,  11^  are  1/4-succinct  implies  that  the  length  of  their  arguments  is  upper  bounded 
by  length  of  the  proofs  of  IlO)  (i.e.,  lno)  <  4(°)  for  every  ze{l,...,f}).  Thus,  the  only  dependency 
on  t  in  the  length  of  the  ciphertext  results  from  the  |dog2(f  +  1)]  bits  describing  the  prefix  z. 


4.3  Chosen-Plaintext  Security 

We  now  prove  that  the  construction  offers  targeted  malleability  against  chosen-plaintext  attacks. 
For  concreteness  we  focus  on  the  case  of  a  single  message  and  a  single  ciphertext  (i.e.,  the  case 
r(k)  =  q(k)  =  1  in  Definition  3.1),  and  note  that  the  more  general  case  is  a  straightforward 
generalization.  Given  an  adversary  A  =  {Ai,  A2)  we  construct  a  simulator  S  =  (Si,  S2)  as  follows. 
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The  algorithm  S\ .  The  algorithm  £1  is  identical  to  A] ,  except  for  also  including  the  public  key 
pk  and  the  distribution  A4  in  the  state  that  it  forwards  to  £2.  That  is,  £ 1  on  input  (1  k,pk)  invokes 
A\  on  the  same  input  to  obtain  a  triplet  (At,  statei,  state2),  and  then  outputs  (At,  statei,  state2) 
where  state2  =  {pk,  At,  state2). 

The  algorithm  £2 •  The  algorithm  £2  on  input  (lfc,state2)  where  state2  =  {pk,  At,  state2),  first 
samples  m'  At,  and  computes  c*  4—  En dpk{m').  Then,  it  samples  r  <—  {0,1}*,  and  computes 

c  =  ^i,Co\c^,7rW^  =  A2(lfc,c*,state2;r).  If  i  ^  {0, . . . ,  t}  or 

( (plfeo, pfci,  ,  c[*} ,  crs(*-1^ , . . . ,  crs(0))  ,7r^,crs^^  =0 

then  £2  outputs  c.  Otherwise,  £2  utilizes  the  knowledge  extractors  guaranteed  by  the  argument 
systems  II^, . . .  ,11^  to  generate  a  “certification  chain”  for  c  of  the  form 

(C<°>,  cf'-1), /«-'),  c») 

satisfying  the  following  two  properties: 

1.  =  c. 

2.  For  every  j  G  {1, . . . ,  i}  it  holds  that  =  HomEvalpfc  . 

We  elaborate  below  on  the  process  of  generating  the  certification  chain.  If  £2  fails  in  generating 
such  a  chain  then  it  outputs  c.  Otherwise,  £2  computes  its  output  as  follows: 

1.  If  c(°>  =  c*  and  i  =  0,  then  £2  outputs  copy1. 

2.  If  c(°>  =  c*  and  i  >  0,  then  £2  outputs  /(°)  o  •  •  •  o 

3.  If  c(°)  7 -  c* ,  then  £2  outputs  c. 

Generating  the  certification  chain.  We  say  that  a  ciphertext  ^,Cq\c^,7t^^  is  valid  if  i  G 
{0, . . . ,  t}  and 

(^(pko,pk\,  Cq\  cf* ,  crs^_1\  . . . ,  crs®^  ,  ir^l\  crs(*^  =  1  . 

Viewing  the  algorithm  A2  as  a  malicious  prover  with  respect  to  the  argument  system  II^  with  the 
common  reference  string  crs^,  whenever  A2  outputs  a  valid  ciphertext  Cq \  c±\  ,  the 

algorithm  £2  invokes  the  knowledge  extractor  Ext^2  that  corresponds  to  A 2  (recall  Definition  2.3) 
to  obtain  a  witness  ^Cq  l\c^  7r^-1\  to  the  fact  that 

(j)ko,pk\,  Cq\  Ci\  crs^-1\  . . . ,  crs^0^  G  . 

Note  that  £2  chooses  the  randomness  for  A2,  which  it  can  then  provide  to  Extn2.  If  successful  then 
by  the  definition  of  L^>  we  have  a  new  valid  cipertext  c^-1^  =  —  1,  Cq  l\cf  7r^-1^  .  If  i  =  1 

then  we  are  done.  Otherwise  (i.e.,  if  i  >  1),  viewing  the  combination  of  A2  and  Extyi2  as  a  malicious 
prover  with  respect  to  the  argument  system  II^-1)  with  the  common  reference  string  crs^-1),  the 
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algorithm  S2  invokes  the  knowledge  extractor  Ext^^t,^)  that  corresponds  to  the  combination  of 
A2  and  Exh42  to  obtain  a  witness  ^Cg  2\c(  2\t rb-2), to  the  fact  that 

(pko,pki,  Cq  _1\  c^_1\  crs^_2\  . . .  ,crs(0))  €  L(*_1)  , 

and  so  on  for  i  iterations  or  until  the  first  failure. 

Having  described  the  simulator  we  now  prove  the  following  theorem  stating  the  security  of  the 
scheme  in  the  case  r{k )  =  q(k)  =  1  (noting  again  that  the  more  general  case  is  a  straightforward 
generalization).  As  discussed  in  Section  2.2,  the  quadratic  blow-up  in  the  length  of  the  common- 
reference  string  in  Groth’s  argument  system  [GrolO]  restricts  our  treatment  here  to  a  constant 
number  t  of  repeated  homomorphic  operations,  and  any  improvement  to  Groth’s  argument  system 
with  a  common-reference  string  of  linear  length  will  directly  allow  any  logarithmic  number  of 
repeated  homomorphic  operations  (and  any  polynomial  number  of  such  operations  in  the  scheme 
presented  in  Section  5). 


Theorem  4.1.  For  any  constant  t  E  N  and  for  any  probabilistic  polynomial-time  adversary  A  the 
distributions  {Realj"^  t  r  9(£0}fceN  and  {Simf1f>'^  t  ^  g(^)}fceN  are  computationally  indistinguishable, 
for  r(k)  =  q(k)  =  1. 


Proof.  We  define  a  sequence  of  distributions  such  that  V 1  =  and  Vj  = 

Real  WXtw  and  prove  that  for  every  i  E  { 1, . . . ,  6}  the  distributions  T>i  and  Xh+i  are  com¬ 
putationally  indistinguishable.  For  simplicity  in  what  follows  we  assume  that  the  scheme  n  = 
(KeyGen,  Enc,  Dec,  HomEval)  actually  has  perfect  decryption  for  all  keys  (and  not  with  an  over¬ 
whelming  probability  over  the  choice  of  keys).  This  assumption  clearly  does  not  hurt  any  of  the 
indistinguishability  arguments  in  our  proof,  since  we  can  initially  condition  on  the  event  that  both 
( sko,pko )  and  ( sk\,pk\ )  provide  perfect  decryption. 


The  distribution  T> This  is  the  distribution  Sirups  t  r  . 

The  distribution  T> 2.  This  distribution  is  obtained  from  T>  1  via  the  following  modification.  As 
in  V 1,  if  S2  fails  to  obtain  a  certification  chain,  then  output  (statei,  m,  _L).  Otherwise,  the 
output  is  computed  as  follows: 

1.  If  c^0)  =  c*  and  i  =  0  then  output  (statei,  m,  copyjJ.  This  is  identical  to  V\. 

2.  If  c(°)  =  c*  and  i  >  0  then  output  (statei,  m,  f(m)),  where  /  =  / (°)  o  •  •  ■  o  /(*_1).  This 
is  identical  to  T>\. 

3.  If  c(°)  7^  c*  then  compute  the  message  =  Dec(,fc(c(°)).  If  /  _L  then  output 
(statei,  m,  /  (m^)) ,  where  /  =  /■°-)  o  ■  ■  ■  o  f^l~l\  and  otherwise  output  (statei,  m,  _L). 
That  is,  in  this  case  instead  of  invoking  the  decryption  algorithm  Dec7  on  c®,  we  invoke 
it  on  c(°),  and  then  apply  the  functions  given  by  the  certification  chain. 

The  distribution  T> 3.  This  distribution  is  obtained  from  T> 2  by  producing  crs^  and  n*  (where 
c*  =  (cq,c^,7t*))  using  the  simulator  of  the  NIZK  proof  system  n^°h 

The  distribution  T> 4.  This  distribution  is  obtained  from  T> 3  by  producing  the  challenge  cipher- 
text  c*  =  (cq,c^,7t*)  with  Cq  =  Encpfe0 (rn)  (instead  of  Cq  =  Enc pk0(m')  as  in  P3). 

The  distribution  T> 5.  This  distribution  is  obtained  from  T> 4  by  producing  the  challenge  cipher- 
text  c*  =  (cq,c^,7t*)  with  cl  =  EnCp/.>;i (m)  (instead  of  c*  =  Encpfc1(m/)  as  in 
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The  distribution  T>q.  This  distribution  is  obtained  from  D5  by  producing  crs*^  and  n*  (where 
c*  =  (co,c^,7r*))  using  the  algorithms  CRSGeb0)  and  respectively  (and  not  by  using  the 
simulator  of  the  NIZK  proof  system  lb°)  as  in  V 5). 

The  distribution  T> 7.  This  is  the  distribution  Real^?^  t  r  . 

Before  proving  that  the  above  distributions  are  computationally  indistinguishable,  we  first  prove 
that  S2  fails  to  produce  a  certification  chain  with  all  but  a  negligible  probability. 

Lemma  4.2.  In  distributions  V\, ...  ,Vq,  whenever  A2  outputs  a  valid  ciphertext,  S2  generates  a 
certification  chain  with  all  but  a  negligible  probability. 

Proof.  Assume  towards  a  contradictions  that  in  one  of  T>\, . . . ,  Vq  with  a  non-negligible  probability 
it  holds  that  A2  outputs  a  valid  ciphertext  but  S2  fails  to  generate  a  certification  chain.  In  particular, 
there  exists  an  index  i  €  { 1 , . . . ,  t }  for  which  with  a  non-negligible  probability  A2  outputs  a  valid 

ciphertext  of  the  form  c-*-*  =  Cq\  c^\  ^ ?  but  S2  fails  to  generate  a  certification  chain.  Recall 

that  for  generating  a  certification  chain  starting  with  c®,  the  simulator  S2  attempts  to  invoke 
i  knowledge  extractors  (until  the  first  failure  occurs)  that  we  denote  by  Ext®,  . . . ,  Exb1).  These 
knowledge  extractors  correspond  to  the  malicious  provers  described  in  the  description  of  S2  for  the 
argument  systems  II®, . . . ,  lb1),  respectively.  Then,  there  exists  an  index  j  €  {1, . . . ,  z}  for  which 
with  a  non-negligible  probability  S2  is  successful  with  Ext®, . . . ,  ExbJ+1)  but  fails  with  Exb®. 
The  fact  that  S2  is  successful  with  ExbJ+1)  implies  that  it  produces  a  valid  ciphertext  c*®  = 
(j,  Cq\  c±\  7r^)) .  In  particular,  it  holds  that 

V®  (^(pko,pk\,  Cq\  cf\  crs^-1), . . . ,  crs*-0-*^  ,  crs^)^  =  1  . 

Now,  the  fact  that  with  a  non-negligible  probability  S2  fails  with  Ext^  immediately  translates  to  a 
malicious  prover  that  contradicts  the  knowledge  extraction  property  of  the  argument  system  lb?h 


We  now  prove  that  for  every  i  €  {1, . . . ,  6}  the  distributions  Vi  and  Pj+i  are  computationally 
indistinguishable . 

Lemma  4.3.  The  distributions  V\  and  V2  are  computationally  indistinguishable. 

Proof.  Whenever  A2  outputs  an  invalid  ciphertext,  or  outputs  a  valid  ciphertext  and  S2  generates 
a  certification  chain,  the  distributions  V\  and  V2  are  identical.  Indeed,  in  such  a  case  the  perfect 
decryption  property  guarantees  that  Dec'^  (c^)  =  /(m/0)).  Therefore,  V\  and  V2  differ  only 
when  when  A2  outputs  a  valid  ciphertext  but  S2  fails  to  generate  a  certification  chain.  Lemma  4.2 
guarantees  that  this  event  occurs  with  only  a  negligible  probability.  ■ 

Lemma  4.4.  The  distributions  V2  and  V3  are  computationally  indistinguishable. 

Proof.  This  follows  from  the  zero- knowledge  property  of  lb0) .  Specifically,  any  efficient  algorithm 
that  distinguishes  between  V2  and  V-,>  can  be  used  (together  with  S)  in  a  straightforward  manner 
to  contradict  the  zero- knowledge  property  of  If)0) .  ■ 

Lemma  4.5.  The  distributions  V%  and  V4  are  computationally  indistinguishable. 
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Proof.  The  simulation  soundness  of  II®  guarantees  that  instead  of  computing  Dec(fc  (c®)  we 
can  verify  that  V(0)  ^pko,pki,  Cq°\  ,7T^°\crs(0^  =  1,  and  then  compute  Dec^  (ci°^-  The 
resulting  distribution  will  be  identical  with  all  but  a  negligible  probability.  This  implies  that  we  do 
not  need  the  key  sk o,  and  this  immediately  translates  to  a  distinguisher  between  (pk o,  En cpk0(m)) 
and  (pko .  Enc :pk0(m')),  where  m  and  m!  are  sampled  independently  from  M..  That  is,  the  simulation 
soundness  of  Fb°)  and  the  semantic  security  of  the  underlying  encryption  scheme  guarantee  that 
V 3  and  V4  are  computationally  indistinguishable.  ■ 

Lemma  4.6.  The  distributions  V4  and  Vq  are  computationally  indistinguishable. 

Proof.  As  in  the  proof  of  Lemma  4.5,  the  simulation  soundness  of  IT0)  guarantees  that  instead  of 
computing  Dec(,fc(c(°))  we  can  verify  that  \A°)  ^(pko,pki,  Cq°\  crs^j  =  1,  and  then  com¬ 

pute  DeCsfc0  (<£»)  .  The  resulting  distribution  will  be  identical  with  all  but  a  negligible  probability. 
This  implies  that  we  do  not  need  the  key  ski ,  and  this  immediately  translates  to  a  distinguisher  be¬ 
tween  ( pk\ ,  EnCpfcj  (rri))  and  (pk\ ,  En cpfc1(m/)),  where  m  and  ml  are  sampled  independently  from  M.. 
That  is,  the  simulation  soundness  of  lT°)  and  the  semantic  security  of  the  underlying  encryption 
scheme  guarantee  that  V 4  and  Vq  are  computationally  indistinguishable.  ■ 

Lemma  4.7.  The  distributions  V 5  and  Vq  are  computationally  indistinguishable. 

Proof.  As  in  the  proof  of  Lemma  4.4,  this  follows  from  the  zero-knowledge  property  of  fl^°\ 
Specifically,  any  efficient  algorithm  that  distinguishes  between  Vq  and  Vq  can  be  used  (together 
with  S)  in  a  straightforward  manner  to  contradict  the  zero-knowledge  property  of  11^.  ■ 

Lemma  4.8.  The  distributions  Vq  and  V-j  are  computationally  indistinguishable. 

Proof.  In  the  distributions  Vq  and  Vi  the  algorithm  A2  receives  an  encryption  c*  of  m.  and  outputs 
a  ciphertext  c.  First,  we  note  that  in  both  Vq  and  V 7,  if  c  =  c*  then  the  output  is  (statei,  m,  copy^, 
and  if  c  is  invalid  then  the  output  is  (statei,  m,  -L).  Therefore,  we  now  focus  on  the  case  that  c  /  c* 
and  c  is  valid.  In  this  case,  in  Vi  the  output  is  (statei,  ra,  Dec(,fc(c)),  and  we  now  show  that  with 
an  overwhelming  probability  the  same  output  is  obtained  also  in  Vq. 

In  Lemma  4.2  guarantees  that  whenever  c  is  valid  S2  produces  a  certification  chain  with  all 
but  a  negligible  probability.  There  are  now  two  cases  to  consider.  In  the  first  case,  if  =  c*  and 
i  >  0  then  the  output  of  Vq  is  (statei,  m,  where  /  =  / ^  o  •  •  •  o  / b-1).  Since  c*  is  in  fact 

an  encryption  of  m,  then  the  perfect  decryption  property  guarantees  that  Dec(fc(c)  =  f(m).  In  the 
second  case,  if  c®  7^  c*  then  the  output  of  Vq  is  (statei,  m,  f  (m^0-*))  where  m ^  =  Dec',*  (c<»>). 
Again  by  the  perfect  decryption  property,  it  holds  that  D ec'sk(c)  =  /(m^).  ■ 

This  concludes  the  proof  of  Theorem  4.1.  ■ 

4.4  Chosen-Ciphertext  Security 

We  now  show  that  the  proof  of  security  in  Section  4.3  in  fact  extends  to  the  setting  of  a-priori 
chosen-ciphertext  attacks  (CCA1).  The  difficulty  in  extending  the  proof  is  that  now  whereas  the 
adversary  A\  is  given  oracle  access  to  the  decryption  algorithm,  the  simulator  S\  is  not  given 
such  access.  Therefore,  it  is  not  immediately  clear  that  S±  can  correctly  simulate  the  decryption 
queries  of  A\.  We  note  that  this  issue  seems  to  capture  the  main  difference  between  the  simulation- 
based  and  the  indistinguishability-based  approaches  for  defining  non-malleability,  as  pointed  out 
by  Bellare  and  Sahai  [BS99]. 
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We  resolve  this  issue  using  the  approach  of  [DDNOO,  BS99]:  S  will  not  run  A  on  the  given  public 
key  pk,  but  instead  will  sample  a  new  public  key  pk'  together  with  a  corresponding  secret  key  sk', 
and  run  A  on  pk' .  This  way,  S  can  use  the  secret  key  sk'  for  answering  all  of  Ai’s  decryption  queries. 
In  addition,  when  A2  outputs  a  ciphertext,  S  then  uses  sk'  for  “translating”  this  ciphertext  from 
pk1  to  pk. 

We  now  provide  the  modified  description  of  S.  Given  an  adversary  A  =  (A\,  A2)  we  define  the 
simulator  S  =  (S±,  S2)  as  follows. 

The  algorithm  S±.  The  algorithm  S 1  on  input  (1  k,pk)  first  samples  ( sk',pk' )  -t—  KeyGen^l^). 
Then,  it  invokes  A\  on  the  input  (1  k,pk')  while  answering  decryption  queries  using  the  secret  key 
sk',  to  obtain  a  triplet  (M,  statei,  state2).  Finally,  it  outputs  (M,  stater,  stated)  where  state2  = 
(A i,pk,  sk' ,pk' ,  state2). 


The  algorithm  S2.  The  algorithm  S2  on  input  (lfc, state2)  where  state2  =  (. M,pk ,  sk',pk',  state2), 
first  samples  m!  <—  JA  and  computes  c*  <—  Enc 'pk,(m').  Then,  it  samples  r  <—  {0, 1}*  and  computes 

Cq \  Ci\  =  A2(lk,  c* ,  statei;  r).  If  c  is  invalid  with  respect  to  pk' ,  that  is,  if  i  £  {0, . . . ,  t} 


c  = 
or 


VW  ((pfeo,P^,4l),cP, 


crs^-1),...^^0) 


,  crs' 


.'«)  =  0 


then  S2  outputs  any  ciphertext  that  is  invalid  with  respect  to  pk  (e.g.,  (t  +  1,  _L,  _L,  _L)).  Other¬ 
wise,  as  in  Section  4.3,  S2  utilizes  the  knowledge  extractors  guaranteed  by  the  argument  systems 
IlW, . . . ,  nW  to  generate  a  “certification  chain”  for  c  of  the  form 


(*-1)j  c(d 


If  S2  fails  in  generating  such  a  chain  then  it  again  outputs  any  invalid  ciphertext  with  respect  to 
pk.  Otherwise,  S2  computes  its  output  as  follows: 

1.  If  C(°>  =  c*  and  i  =  0,  then  S2  outputs  copy!. 

2.  If  c(°)  =  c*  and  i  >  0,  then  S2  outputs  o  •  •  •  o 

3.  If  c(°)  yf  c* ,  then  S2  outputs  the  ciphertext  c  that  is  obtained  by  “translating”  c  from  pk'  to 
pk  as  follows.  First,  S2  computes  fh  =  Decsfc/  (c^0-*).  Then,  it  computes  =  Encpfc  (m),  and 
c(A  =  HomEvalpfc  ( c for  every  j  G  {1, . . . ,  *}.  The  ciphertext  c  is  then  defined  as 


Having  described  the  modified  simulator  we  now  prove  the  following  theorem  stating  the  security 
of  the  scheme  in  the  case  r(k)  =  q(k)  =  1  (noting  once  again  that  the  more  general  case  is  a 
straightforward  generalization) . 

Theorem  4.9.  For  any  constant  t  €  N  and  for  any  probabilistic  polynomial-time  adversary  A  the 
distributions  { Real^?^1^  r  g(A:)}fceM  o,nd  {Simfj,  ^.1Pr_(?(A:)}fcep:j  are  computationally  indistinguishable, 
for  r(k)  =  q(k)  =  1. 


Proof.  As  in  the  proof  of  Theorem  4.1  we  define  a  similar  sequence  of  distributions  V 1, . . .  ,T> 7 
such  that  V\  =  ^  ^  and  V7  =  Realjj£ ^  ,  and  prove  that  for  every  i  €  {1, . . . ,  6}  the 

distributions  T>,  and  T>,+ 1  are  computationally  indistinguishable.  The  main  difference  is  that  in 
this  case  we  change  the  distribution  of  the  public  key  pk'  chosen  by  the  simulator,  and  not  of  the 
given  public  key  pk.  The  proofs  are  very  similar,  and  therefore  here  we  only  point  out  the  main 
differences. 
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The  distribution  T> ±.  This  is  the  distribution  Sirrinfg*  . 

The  distribution  T> 2.  This  distribution  is  obtained  from  T>  1  via  the  following  modification.  As 
in  V 1,  if  S2  fails  to  obtain  a  certification  chain,  then  output  (statei,  m,  _L).  Otherwise,  the 
output  is  computed  as  follows: 

1.  If  C(°>  =  c*  and  i  =  0  then  output  (statei,  m,  copy^.  This  is  identical  to  V\. 

2.  If  c®  =  c*  and  i  >  0  then  output  (statei,  m,  /(m)),  where  /  =  / o  •  •  •  o  This 

is  identical  to  T>\. 

3.  If  c®  /  c*  then  compute  m =  Dec'sfc/  (c^).  If  m<0'1  /  _L  output  (statei,  m,/  ( m (°))), 
where  /  =  o  ■  ■■  o  f b-1),  and  otherwise  output  (statei,  m,  _L).  That  is,  in  this  case 
instead  of  invoking  the  decryption  algorithm  Dec7  on  c^\  we  invoke  it  on  c^0),  and  then 
apply  the  functions  given  by  the  certification  chain. 

The  distribution  Z>3.  This  distribution  is  obtained  from  T> 2  by  changing  the  distribution  of  pk1 
(that  is  chosen  by  S)  and  the  challenge  ciphertext:  we  produce  crs'(0)  and  n*  (where  c*  = 
(co,Ca,7r*))  using  the  simulator  of  the  NIZK  proof  system  IT0k 

The  distribution  T> 4.  This  distribution  is  obtained  from  T> 3  by  producing  the  challenge  cipher- 
text  c*  =  (cq,c^,7t*)  with  Cq  =  Enc py{nn)  (instead  of  Cq  =  EnCp^m')  as  in  D3). 

The  distribution  T> 5.  This  distribution  is  obtained  from  T> 4  by  producing  the  challenge  cipher- 
text  c*  =  (cq,  ,  7r* )  with  cl  =  Encpfc/(m)  (instead  of  c\  =  Encpfc/(m')  as  in  V4). 

The  distribution  T>q.  This  distribution  is  obtained  from  V 5  by  changing  the  distribution  of  pk' 
(that  is  chosen  by  S )  and  the  challenge  ciphertext:  we  produce  crs^0)  and  7 r*  (where  c*  = 
(eg, d[, 7r*))  using  the  algorithms  CRSGer/0)  and  P(0),  respectively  (and  not  by  using  the 
simulator  of  the  NIZK  proof  system  ih0)  as  in  U5). 

The  distribution  T>t.  This  is  the  distribution  Real^f^  t  r  . 


The  remainder  of  the  proof  is  essentially  identical  to  the  proof  of  Theorem  4.1.  The  only  subtle 
point  is  that  Si  can  simulate  the  decryption  oracle  to  A\  while  knowing  only  one  of  sk'0  and  sk[. 
Specifically,  given  a  ciphertext  Cq\  7r^  ,  it  outputs  _L  if  i  ^  {0, . . . ,  t)  or 

[^pk^pkl,  Cq\ c^\ crs'^-1), . . . ,  crs^0^  ,  7rW,crs'Wj  =0  . 


Otherwise,  it  computes  =  Decsfc/  j  for  the  value  b  €  {0, 1}  for  which  its  known  the  key  sk'b. 

The  soundness  of  the  proof  system  lT°)  and  of  the  argument  systems  ih1), . . . ,  11^  guarantee  that 
the  simulation  is  correct  with  all  but  a  negligible  probability.  ■ 


5  The  Tree-Based  Construction 

In  this  section  we  present  our  second  construction  which  is  obtained  by  modifying  our  first  con¬ 
struction  to  offer  a  different  trade-off  between  the  length  of  the  public  key  and  the  length  of  the 
ciphertext.  As  in  Section  4,  the  scheme  is  parameterized  by  an  upper  bound  t  on  the  number  of 
repeated  homomorphic  operations  that  can  be  applied  to  a  ciphertext  produced  by  the  encryption 
algorithm.  Recall  that  in  our  first  construction,  the  length  of  the  ciphertext  is  essentially  indepen¬ 
dent  of  t,  and  the  public  key  consists  of  t  +  1  common  reference  strings.  In  our  second  construction 
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the  number  of  common  reference  strings  in  the  public  key  is  only  log  t,  and  a  ciphertext  now  consists 
of  logf  ciphertexts  of  the  underlying  homomorphic  scheme  and  logf  succinct  arguments.  Such  a 
trade-off  may  be  preferable  over  the  one  offered  by  our  first  construction,  for  example,  when  using 
argument  systems  that  are  tailored  to  the  MV  languages  under  considerations  and  or  when  it  is  not 
possible  to  use  the  same  common  reference  string  for  all  argument  systems  (depending,  of  course, 
on  the  length  of  the  longest  common  reference  strings). 

The  main  idea  underlying  this  construction  is  that  the  arguments  computed  by  the  homomorphic 
evaluation  algorithm  form  a  tree  structure  instead  of  a  path  structure.  Specifically,  instead  of  using 
t  argument  systems,  we  use  only  d  =  log  t  argument  systems  where  the  z-th  one  is  used  for  arguing 
the  well-formedness  of  a  ciphertext  after  2*  repeated  homomorphic  operations. 

In  Section  5.1  we  formally  specify  the  building  blocks  of  the  scheme,  and  in  Section  5.2  we  pro¬ 
vide  a  description  of  the  scheme  and  discuss  its  proof  of  security  against  a-priori  chosen-ciphertext 
attacks  (CCA1),  which  is  rather  similar  to  that  of  our  first  construction. 

5.1  The  Building  Blocks 

Our  construction  relies  on  the  following  building  blocks: 

1.  A  homomorphic  public-key  encryption  scheme  II  =  (KeyGen,  Enc,  Dec,  HomEval)  with  respect 
to  an  efficiently  recognizable  set  of  efficiently  computable  functions  J- .  We  assume  that  the 
scheme  has  almost-all-key  perfect  decryption  (see  Definition  2.1).  In  addition,  as  discussed 
in  Section  2.1,  as  we  do  not  consider  function  privacy  we  assume  without  loss  of  generality 
that  HomEval  is  deterministic. 

For  any  security  parameter  k  G  N  we  denote  by  lpk  =  £pk(k),  =  £m{k),  =  kr(k).  and 

ic  =  ic{k)  the  bit-lengths  of  the  public  key,  plaintext,  randomness  of  Enc,  and  ciphertext', 
respectively,  for  the  scheme  II.  In  addition,  we  denote  by  Vj-  the  deterministic  polynomial¬ 
time  algorithm  for  testing  membership  in  the  set  J7,  and  denote  by  lj?  =  the  bit-length 

of  the  description  of  each  function  /  £  F. 

2.  A  non-interactive  deterministic- veriher  simulation-sound  adaptive  zero- knowledge  proof  sys¬ 
tem  (see  Section  2.3)  n(°)  =  ^CRSGen^,  p(°),  for  the  MV -language  =  (J k£NL^°\k) 
defined  as  follows. 

3(m,  ro,ri)  €  {0,  l}£m+2^  s.t. 

(pk0,ph,  c[j0),  40))  <E  {0,  i}2Afc+2^  .  c(°)  =  Encpko(?n;  r0)  > 

and  c<n)  =  EnCp^  (?n;  ri) 

For  any  security  parameter  fceNwe  denote  by  £crs(o)  =  £crs(o)(k )  and  £n(o)  =  )(k)  the  bit- 

lengths  of  the  common  reference  strings  produced  by  CRSGen^  and  of  the  proofs  produced 
by  P(0\  respectively. 

3.  For  every  j  €  {l,...,d}  (where  d  =  [logf])  a  1/7-succinct  non-interactive  deterministic- 

verifier  extractable  argument  system  (see  Section  2.2)  II W  =  ^CRSGen^,  for  the 

1  For  simplicity  we  assume  a  fixed  upper  bound  lc  on  the  length  of  ciphertexts,  but  this  is  not  essential  to 
our  construction.  More  generally,  one  can  allow  the  length  of  ciphertexts  to  increase  as  a  result  of  applying  the 
homomorphic  operation. 


L^\k)  = 
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Af'P-language  L ^  =  UfceN  L^\k)  defined  as  follows  for  j  =  1 


L^\k)  = 


pko,pki,c^\c[L\c^\c^  G  {0,  l}2£rfc+44  ;  j 

3fe{ 0,1}^  S.t. 

•  V^(/)  =  l 


,(2) 

-o 

„(2) 


=  HomEval 
=  HomEval 


pk  o 
pki 


.(1) 

"0 

.(1) 


f 

>/ 


and  defined  as  follows  for  j  >  1: 

L^(fc)  = 

pk0,pk1,c^\c^\c^3\cf3\vci^-1A  G  {0, 1}£ 


,(2*_1)  (2J  — 1)  (2J  !  +  l) 


'0 


J C1  ’  c0 


>ci 


\/,7T 


0-1)  O'— !) 

’  7rR 


G  {0, 1}£2J)  s.t. 


=  Horn  Eva  lpfco  (c[,2J  \/] 


cf  +1^  =  HomEvalpfcj  |  c 


o 

(21-1) 

1 


,/ 


v(j'  1}  (Ypfeo,^,^,^,^  \<f  \crl^  2)),7rjf  crs^  x))  =  1 
VO-1)  Mpfc0lpM,cf  +1),42J  +1),42J),cfJ),cr|(-i-2)V7r^_1),crs^“1) 


=  1 


where  =  2£pk+Mc+'^2t-i  ^crs(t) ,  =  4£c+^+2£7r(j_i) ,  and  for  every  1  <  j  <  d  we  define 

cr|0)  =  (crs^\  . . . ,  crs^1-*).  For  any  security  parameter  k  &  N  we  denote  by  £crsu)  =  ^Crs u)(k) 
and  t^(j)  =  l^(j){k)  the  bit-lengths  of  the  common  reference  strings  produced  by  CRSGenIJ) 
and  of  the  arguments  produced  by  P^,  respectively. 


We  note  that  for  j  =  1  we  in  fact  do  not  need  an  argument  system,  as  we  can  use  the  witness 
/  G  J  as  the  arguments.  Without  loss  of  generality  we  assume  that  >  max  {£c,  £jrj 
(as  otherwise  arguments  can  always  be  padded).  Thus,  the  assumption  that  the  argument 
systems  II^2), . . . ,  11^)  are  1/7-succinct  implies  that  the  length  of  their  arguments  is  upper 
bounded  by  that  of  11^  ^  (and  therefore  independent  of  t). 


5.2  The  Scheme 


The  scheme  II'  =  (KeyGen7,  Enc7,  Dec7,  HomEval7)  is  parameterized  by  an  upper  bound  t  on  the 
number  of  repeated  homomorphic  operations,  and  we  let  d  =  [~logf~|.  The  key-generation  and 
encryption  algorithm  are  essentially  identical  to  those  described  in  Section  4.2: 

•  Key  generation:  On  input  lk  sample  two  pairs  of  keys  ( sko,pko )  -t—  KeyGen(lfc)  and 
(. ski,pk\ )  «—  KeyGen(lfc).  Then,  for  every  j  G  {0,  ...,d}  sample  crs^  <—  CRSGen^(lfc). 
Output  the  secret  key  sk  =  (sko,  ski )  and  the  public  key  pk  =  (pko,pki,  crs!°f, . . . ,  crs(rf)) . 


Encryption:  On  input  a  public  key  pk  and  a  plaintext  m,  sample  ro,  r \  G  {0, 1}*  uniformly 
at  random,  and  output  the  ciphertext  c^0)  =  ^Cq°\  c[°\ 7r^0^ ,  where 

co0)  =  Enc pko(m;r0)  , 

40)  =  Encpkl  (m;n)  , 

7i-(0)  G-  P(0)  ((pfco,M’i,Co0),Ci0))  ,(m,r0,n) 


crs 


(0) 
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Homomorphic  evaluation:  The  homomorphic  evaluation  algorithm  follows  the  same  ap¬ 
proach  used  in  Section  4.2,  but  computes  the  arguments  of  well-formedness  in  the  form  of  a 
sparse  binary  tree.  The  leaves  of  the  tree  correspond  to  a  chain  of  ciphertexts  (c^1), . . . ,  c®) 
that  are  generated  from  one  another  using  the  homomorphic  evaluation  algorithm.  Each 
internal  node  at  level  j  €  { 1 , . . . ,  d}  (where  the  leaves  are  considered  to  be  at  level  0)  is  a 
succinct  argument  for  membership  in  the  language  L^\  We  first  describe  how  to  generate 
the  leaves  and  the  internal  nodes,  and  then  describe  the  content  of  a  ciphertext  (i.e.,  which 
nodes  of  the  tree  should  be  contained  in  a  ciphertext). 


—  The  leaves:  The  leftmost  leaf  in  the  tree  =  ^Cq1  ,  j  is  generated  from  a  cipher- 
text  c(°)  =  (cq°\  40),  7r^°^  that  is  produced  by  the  encryption  algorithm  and  a  function 
/(°)  €  T .  It  is  defined  as 

4^  =  HomEvalpfco  (c[,0),  /(0))  , 

41}  =  HomEvalpfcl  (c^, /(0))  . 

From  this  point  on  both  the  ciphertext  c®,  the  function  /(°),  and  the  leaf  are  kept 
part  of  all  future  ciphertexts. 

For  every  i  €  {1, . . . ,  t— 1}  the  leaf  c^+1^  =  (cq  +1\  c[‘+1^  is  generated  from  the  previous 
leaf  c(l>  =  and  a  function  /W  g  J  by  computing 

c^+1)  =  Horn  Eva  lpfc0  , 

cf+1)  =  HomEvalpfcl  (44 /W)  . 


—  The  internal  nodes:  Each  internal  node  v  at  level  j  €  { 1, . . . ,  cZ}  is  an  argument 
for  membership  in  the  language  L1'1'1 .  For  j  =  1,  the  two  children  of  x  are  leaves 
cW  =  (cq\  and  c^+1^  =  (eg +1\  c[*+1^  ,  and  in  this  case  v  is  an  argument  that  c^+1^ 

is  obtained  from  c ®  using  the  homomorphic  operation  with  some  function  €  T .  This 
is  computed  as: 


7T  -t-  P(1)  (^(pk0,pki,c^ 


c 


(0  _(<+i) 

1  >  Lo 


r(i+l) 

C1 


For  every  j  €  {2, ...  ,d},  denote  by  vl  and  vr  the  two  children  of  v.  These  are  arguments 


for  membership  in  l/7  x).  Denote  by  cW  =  (cq4c4)  and  c^2J  ^  =  (44  \c 4  ^ 


leaves  in  the  subtree  of  vr,  respectively.  Then,  the  node  v  is  an  argument  that  vr  is  a 
valid  argument  for  (c^, . . . ,  c^j  vr  is  a  valid  argument  for  ^ c ^  1+1), . . . ,  , 

and  that  c ^  1+1)  is  obtained  from  c ^  '  -1  using  the  homomorphic  operation  with  some 
function  f^2J  )  €  T.  This  is  computed  as  n  4—  P*7-*  (x,  w,  crs^)) ,  where: 
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Figure  2:  The  above  illustration  shows  the  structure  of  a  ciphertext  after  13  repeated  homo¬ 
morphic  operations.  The  ciphertext  c*'0-*  is  an  output  of  the  encryption  algorithm,  and  for  every 
i  G  {1, . . . ,  13}  the  ciphertext  cb)  is  obtained  from  cb_1)  by  applying  the  homomorphic  evaluation 
algorithm  using  a  function  /W  g  J7.  The  internal  nodes  on  levels  1,  2,  and  3  contain  succinct 
arguments  for  membership  in  the  languages  Z/1),  L^,  and  L^3\  respectively.  The  ciphertext  of 
the  new  scheme  consists  of  the  black  nodes  and  the  functions  and  /l12b 


—  The  ciphertext:  The  ciphertext  always  includes  the  initial  ciphertext  that  was 
produced  by  the  encryption  algorithm,  the  first  leaf  c^\  and  the  function  / (°)  €  T  that 
was  used  for  generating  from  c®.  Then,  every  time  we  compute  the  value  of  two 
adjacent  internal  nodes  vl  and  vr  at  some  level  j  —  1  that  belong  to  the  same  subtree, 
we  compute  the  value  of  their  parent  v  at  level  j ,  as  described  above.  As  a  result,  we 
do  not  longer  keep  any  information  from  the  subtree  of  v,  except  for  its  leftmost  and 
rightmost  leaves.  In  addition,  for  every  two  adjacent  subtrees  we  include  the  function 
that  transforms  the  rightmost  leaf  of  the  first  subtree  to  the  leftmost  leaf  of  the  second 
subtree.  Note  that  such  subtrees  must  be  of  different  depths,  as  otherwise  they  are 
merged.  Thus,  at  any  point  in  time  a  ciphertext  may  contain  at  most  2d+  1  ciphertexts 
of  the  underlying  scheme,  d  short  arguments,  and  d  descriptions  of  functions  from  T 
(connecting  subtrees).  See  Figure  2  for  an  illustration  of  the  structure  of  a  ciphertext. 

•  Decryption:  On  input  the  secret  key  sk  and  a  ciphertext  of  the  above  form,  verify  the  validity 
of  the  non-interactive  zero-knowledge  proof  contained  in  verify  that  c W  is  obtained  from 
c(°)  using  the  function  €  J7,  verify  that  the  given  tree  has  the  right  structure  (with 
functions  from  T  connecting  subtrees) ,  and  verify  the  validity  of  all  the  arguments  in  the  non¬ 
empty  internal  nodes  of  the  tree.  If  any  of  these  verifications  fail,  then  output  _L.  Otherwise, 
compute  mo  =  Decsfc0  and  mi  =  Dec^  where  cW  =  ^Cq\c^  is  the  rightmost 

leaf.  If  mo  /  m\  then  output  _L,  and  otherwise  output  mo- 


Chosen-ciphertext  security.  As  this  scheme  is  obtained  from  the  one  in  Section  4  by  only 
changing  the  structure  of  the  arguments  that  generate  the  ciphertext,  the  proof  of  security  is 
rather  similar  to  that  in  Sections  4.3  and  4.4.  The  only  difference  is  in  the  way  S-2  produces 
the  “certification  chain”  for  the  ciphertext  that  the  adversary  outputs:  instead  of  using  the  path 
structure  of  the  ciphertext,  the  simulator  now  uses  the  tree  structure  of  the  ciphertext  and  applies 
the  knowledge  extractors  accordingly.  The  remainder  of  the  proof  is  exactly  the  same. 

6  Extensions  and  Open  Problems 

We  conclude  the  paper  with  a  discussion  of  several  extensions  of  our  work  and  open  problems. 
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The  number  of  repeated  homomorphic  operations.  Our  schemes  allow  any  pre-specified 
constant  bound  t  G  N  on  the  number  of  repeated  homomorphic  operations.  It  would  be  interesting 
to  allow  this  bound  to  be  a  function  t{k )  of  the  security  parameter.  As  discussed  in  Section  2.2, 
the  bottleneck  is  the  super-linear  length  of  the  common-reference  string  in  Groth’s  and  Lipmaa’s 
argument  systems  [GrolO,  Lipll],  Any  improvement  to  these  argument  systems  with  a  common- 
reference  string  of  linear  length  will  directly  allow  any  logarithmic  number  of  repeated  homomorphic 
operations  in  the  path-based  scheme,  and  any  polynomial  number  of  such  operations  in  the  tree- 
based  scheme. 

Function  privacy  and  unlinkability.  For  some  applications  a  homomorphic  encryption  scheme 
may  be  required  to  ensure  function  privacy  [GHY10]  or  even  unlinkability  [PR08].  Function  privacy 
asks  that  the  homomorphic  evaluation  algorithm  does  not  reveal  (in  a  semantic  security  fashion) 
which  operation  it  applies,  and  unlinkability  asks  that  the  output  of  the  homomorphic  evaluation 
algorithm  is  computationally  indistinguishable  from  the  output  of  the  encryption  algorithm.  For 
example,  the  voting  application  discussed  in  the  introduction  requires  function  privacy  to  ensure 
that  individual  votes  remain  private.  Our  approach  in  this  paper  focuses  on  preventing  a  blow-up  in 
the  length  of  ciphertexts,  and  incorporating  function  privacy  and  unlinkability  into  our  framework 
is  an  interesting  direction  for  future  work.  We  note  that  since  Groth’s  argument  system  [GrolO] 
is  also  zero-knowledge  it  is  quite  plausible  to  show  that  ciphertexts  in  our  schemes  reveal  nothing 
more  than  the  number  of  repeated  homomorphic  operations. 

A-posteriori  chosen-ciphertext  security  (CCA2).  As  discussed  in  Section  1.3,  Prabhakaran 
and  Rosulek  [PR08]  considered  the  rather  orthogonal  problem  of  providing  a  homomorphic  encryp¬ 
tion  scheme  that  is  secure  against  a  meaningful  variant  of  a-posteriori  chosen-ciphertext  attacks 
(CCA2).  In  light  of  the  fact  that  our  schemes  already  offer  targeted  malleability  against  a-priori 
chosen-ciphertext  attacks  (CCA1),  it  would  be  interesting  to  extend  our  approach  to  the  setting 
considered  by  Prabhakaran  and  Rosulek. 
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Abstract 

We  give  new  methods  for  generating  and  using  “strong  trapdoors”  in  cryptographic  lattices,  which 
are  simultaneously  simple,  efficient,  easy  to  implement  (even  in  parallel),  and  asymptotically  optimal 
with  very  small  hidden  constants.  Our  methods  involve  a  new  kind  of  trapdoor,  and  include  specialized 
algorithms  for  inverting  LWE,  randomly  sampling  SIS  preimages,  and  securely  delegating  trapdoors. 
These  tasks  were  previously  the  main  bottleneck  for  a  wide  range  of  cryptographic  schemes,  and  our 
techniques  substantially  improve  upon  the  prior  ones,  both  in  terms  of  practical  performance  and  quality 
of  the  produced  outputs.  Moreover,  the  simple  structure  of  the  new  trapdoor  and  associated  algorithms  can 
be  exposed  in  applications,  leading  to  further  simplifications  and  efficiency  improvements.  We  exemplify 
the  applicability  of  our  methods  with  new  digital  signature  schemes  and  CCA-secure  encryption  schemes, 
which  have  better  efficiency  and  security  than  the  previously  known  lattice-based  constructions. 


1  Introduction 

Cryptography  based  on  lattices  has  several  attractive  and  distinguishing  features: 

•  On  the  security  front,  the  best  attacks  on  the  underlying  problems  require  exponential  time  in 
the  main  security  parameter  n,  even  for  quantum  adversaries.  By  constrast,  for  example,  mainstream 
factoring-based  cryptography  can  be  broken  in  subexponential  2°(nl/3)  time  classically,  and  even  in 
polynomial  n°ri)  time  using  quantum  algorithms.  Moreover,  lattice  cryptography  is  supported  by 
strong  worst-case/average-case  security  reductions,  which  provide  solid  theoretical  evidence  that  the 
random  instances  used  in  cryptography  are  indeed  asymptotically  hard,  and  do  not  suffer  from  any 
unforeseen  “structural”  weaknesses. 
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DARPA  or  the  U.S.  Government. 
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•  On  the  efficiency  and  implementation  fronts,  lattice  cryptography  operations  can  be  extremely  simple, 
fast  and  parallel i/ablc.  Typical  operations  are  the  selection  of  uniformly  random  integer  matrices  A 
modulo  some  small  q  =  poly(n),  and  the  evaluation  of  simple  linear  functions  like 

/a(x)  :=  Ax  mod  q  and  (Ja(s,  e)  :=  s*A  +  ef  mod  q 

on  short  integer  vectors  x,  e.1  (For  commonly  used  parameters,  /a  is  surjective  while  g\  is  injective.) 
Often,  the  modulus  q  is  small  enough  that  all  the  basic  operations  can  be  directly  implemented  using 
machine-level  arithmetic.  By  contrast,  the  analogous  operations  in  number-theoretic  cryptography  (e.g., 
generating  huge  random  primes,  and  exponentiating  modulo  such  primes)  are  much  more  complex, 
admit  only  limited  parallelism  in  practice,  and  require  the  use  of  “big  number”  arithmetic  libraries. 

In  recent  years  lattice-based  cryptography  has  also  been  shown  to  be  extremely  versatile,  leading  to  a  large 
number  of  theoretical  applications  ranging  from  (hierarchical)  identity-based  encryption  [GPV08,  CHKP10, 
ABB  10a,  ABB  10b],  to  fully  homomorphic  encryption  schemes  [Gen09b,  Gen09a,  vGHV  10,  BV 1  lb,  BV  11a, 
GH11,  BGV11],  and  much  more  (e.g.,  [LM08,  PW08,  Lyu08,  PV08,  PVW08,  Pei09b,  ACPS09,  RiiclO, 
BoylO,  GHV10,  GKV10]). 

Not  all  lattice  cryptography  is  as  simple  as  selecting  random  matrices  A  and  evaluating  linear  functions 
like  /a  (x)  =  Ax  mod  q ,  however.  In  fact,  such  operations  yield  only  collision-resistant  hash  functions, 
public -key  encryption  schemes  that  are  secure  under  passive  attacks,  and  little  else.  Richer  and  more  advanced 
lattice-based  cryptographic  schemes,  including  chosen  ciphertext-secure  encryption,  “hash-and-sign”  digital 
signatures,  and  identity -based  encryption  also  require  generating  a  matrix  A  together  with  some  “strong” 
trapdoor,  typically  in  the  form  of  a  nonsingular  square  matrix  (a  basis)  S  of  short  integer  vectors  such  that 
AS  =  0  mod  q.  (The  matrix  S  is  usually  interpreted  as  a  basis  of  a  lattice  defined  by  using  A  as  a  “parity 
check”  matrix.)  Applications  of  such  strong  trapdoors  also  require  certain  efficient  inversion  algorithms  for  the 
functions  /a  and  qa,  using  S.  Appropriately  inverting  /a  can  be  particularly  complex,  as  it  typically  requires 
sampling  random  preimages  of  /a(x)  according  to  a  Gaussian-like  probability  distribution  (see  [GPV08]). 

Theoretical  solutions  for  all  the  above  tasks  (generating  A  with  strong  trapdoor  S  [Ajt99,  AP09],  trapdoor 
inversion  of  <]a  and  preimage  sampling  for  /a  [GPV08])  are  known,  but  they  are  rather  complex  and  not  very 
suitable  for  practice,  in  either  runtime  or  the  “quality”  of  their  outputs.  (The  quality  of  a  trapdoor  S  roughly 
corresponds  to  the  Euclidean  lengths  of  its  vectors  —  shorter  is  better.)  The  current  best  method  for  trapdoor 
generation  [AP09]  is  conceptually  and  algorithmically  complex,  and  involves  costly  computations  of  Hermite 
normal  forms  and  matrix  inverses.  And  while  the  dimensions  and  quality  of  its  output  are  asymptotically 
optimal  (or  nearly  so,  depending  on  the  precise  notion  of  quality),  the  hidden  constant  factors  are  rather  large. 
Similarly,  the  standard  methods  for  inverting  (ja  and  sampling  preimages  of  /a  [Bab85,  KleOO,  GPV08] 
are  inherently  sequential  and  time-consuming,  as  they  are  based  on  an  orthogonalization  process  that  uses 
high-precision  real  numbers.  A  more  efficient  and  parallel izablc  method  for  preimage  sampling  (which 
uses  only  small-integer  arithmetic)  has  recently  been  discovered  [Pei  10],  but  it  is  still  more  complex  than  is 
desirable  for  practice,  and  the  quality  of  its  output  can  be  slightly  worse  than  that  of  the  sequential  algorithm 
when  using  the  same  trapdoor  S. 

More  compact  and  efficient  trapdoors  appear  necessary  for  bringing  advanced  lattice-based  schemes 
to  practice,  not  only  because  of  the  current  unsatisfactory  runtimes,  but  also  because  the  concrete  security 
of  lattice  cryptography  can  be  quite  sensitive  to  even  small  changes  in  the  main  parameters.  As  already 

1  Inverting  these  functions  corresponds  to  solving  the  “short  integer  solution”  (SIS)  problem  [Ajt96]  for  /a,  and  the  “learning 
with  errors”  (LWE)  problem  [Reg05]  for  (/a.  both  of  which  are  widely  used  in  lattice  cryptography  and  enjoy  provable  worst-case 
hardness. 
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mentioned,  two  central  objects  arc  a  uniformly  random  matrix  A  G  Z”xm  that  serves  as  a  public  key,  and  an 
associated  secret  matrix  S  6  Zmxm  consisting  of  short  integer  vectors  having  “quality”  s,  where  smaller 
is  better.  Here  n  is  the  main  security  parameter  governing  the  hardness  of  breaking  the  functions,  and  m  is 
the  dimension  of  a  lattice  associated  with  A,  which  is  generated  by  the  vectors  in  S.  Note  that  the  security 
parameter  n  and  lattice  dimension  m  need  not  be  the  same;  indeed,  typically  we  have  m  =  0(n  lg  q),  which 
for  many  applications  is  optimal  up  to  constant  factors.  (For  simplicity,  throughout  this  introduction  we 
use  the  base-2  logarithm;  other  choices  are  possible  and  yield  tradeoffs  among  the  parameters.)  For  the 
trapdoor  quality,  achieving  s  =  ()( \frFi)  is  asymptotically  optimal,  and  random  preimages  of  /a  generated 
using  S  have  Euclidean  length  (3  ~  Ssfrii.  For  security,  it  must  be  hard  ( without  knowing  the  trapdoor)  to  find 
any  preimage  having  length  bounded  by  (3 .  Interestingly,  the  computational  resources  needed  to  do  so  can 
increase  dramatically  with  only  a  moderate  decrease  in  the  bound  /3  (see,  e.g.,  [GN08,  MR09]).  Therefore, 
improving  the  parameters  m  and  s  by  even  small  constant  factors  can  have  a  significant  impact  on  concrete 
security.  Moreover,  this  can  lead  to  a  “virtuous  cycle”  in  which  the  increased  security  allows  for  the  use 
of  a  smaller  security  parameter  n,  which  leads  to  even  smaller  values  of  m,  s,  and  (3 ,  etc.  Note  also  that 
the  schemes’  key  sizes  and  concrete  runtimes  are  reduced  as  well,  so  improving  the  parameters  yields  a 
“win-win-win”  scenario  of  simultaneously  smaller  keys,  increased  concrete  security,  and  faster  operations. 
(This  phenomenon  is  borne  out  concretely;  see  Figure  2.) 

1.1  Contributions 

The  first  main  contribution  of  this  paper  is  a  new  method  of  trapdoor  generation  for  cryptographic  lattices, 
which  is  simultaneously  simple,  efficient,  easy  to  implement  (even  in  parallel),  and  asymptotically  optimal 
with  small  hidden  constants.  The  new  trapdoor  generator  strictly  subsumes  the  prior  ones  of  [Ajt99,  AP09], 
in  that  it  proves  the  main  theorems  from  those  works,  but  with  improved  concrete  bounds  for  all  the 
relevant  quantities  (simultaneously),  and  via  a  conceptually  simpler  and  more  efficient  algorithm.  To 
accompany  our  trapdoor  generator,  we  also  give  specialized  algorithms  for  trapdoor  inversion  (for  g\)  and 
preimage  sampling  (for  /a),  which  are  simpler  and  more  efficient  in  our  setting  than  the  prior  general 
solutions  [Bab85,  KleOO,  GPV08,  PeilO], 

Our  methods  yield  large  constant-factor  improvements,  and  in  some  cases  even  small  asymptotic  im¬ 
provements,  in  the  lattice  dimension  m,  trapdoor  quality  s,  and  storage  size  of  the  trapdoor.  Because  trapdoor 
generation  and  inversion  algorithms  are  the  main  operations  in  many  lattice  cryptography  schemes,  our 
algorithms  can  be  plugged  in  as  ‘black  boxes’  to  deliver  significant  concrete  improvements  in  all  such  applica¬ 
tions.  Moreover,  it  is  often  possible  to  expose  the  special  (and  very  simple)  structure  of  our  trapdoor  directly 
in  cryptographic  schemes,  yielding  additional  improvements  and  potentially  new  applications.  (Below  we 
summarize  a  few  improvements  to  existing  applications,  with  full  details  in  Section  6.) 

We  now  give  a  detailed  comparison  of  our  results  with  the  most  relevant  prior  works  [Ajt99,  AP09, 
GPV08,  PeilO].  The  quantitative  improvements  arc  summarized  in  Figure  1. 

Simpler,  faster  trapdoor  generation  and  inversion  algorithms.  Our  trapdoor  generator  is  exceedingly 
simple,  especially  as  compared  with  the  prior  constructions  [Ajt99,  AP09].  It  essentially  amounts  to  just  one 
multiplication  of  two  random  matrices,  whose  entries  are  chosen  independently  from  appropriate  probability 
distributions.  Surprisingly,  this  method  is  nearly  identical  to  Ajtai’s  original  method  [Ajt96]  of  generating  a 
random  lattice  together  with  a  “weak”  trapdoor  of  one  or  more  short  vectors  (but  not  a  full  basis),  with  one 
added  twist.  And  while  there  arc  no  detailed  runtime  analyses  or  public  implementations  of  [Ajt99,  AP09], 
it  is  clear  from  inspection  that  our  new  method  is  significantly  more  efficient,  since  it  does  not  involve  any 
expensive  Hermite  normal  form  or  matrix  inversion  computations. 
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Our  specialized,  parallel  inversion  algorithms  for  /a  and  qa  are  also  simpler  and  more  practically 
efficient  than  the  general  solutions  of  [Bab85,  KleOO,  GPV08,  PeilO]  (though  we  note  that  our  trapdoor 
generator  is  entirely  compatible  with  those  general  algorithms  as  well).  In  particular,  we  give  the  first  parallel 
algorithm  for  inverting  under  asymptotically  optimal  error  rates  (previously,  handling  such  large  errors 
required  the  sequential  "nearest-plane”  algorithm  of  [Bab85]),  and  our  preimage  sampling  algorithm  for  /a 
works  with  smaller  integers  and  requires  much  less  offline  storage  than  the  one  from  [PeilO], 

Tighter  parameters.  To  generate  a  matrix  A  €  Z” xm  that  is  within  negligible  statistical  distance  of 
uniform,  our  new  trapdoor  construction  improves  the  lattice  dimension  from  m  >  5n  lg  q  [AP09]  down  to 
rn  ~  2 n  lg  q.  (In  both  cases,  the  base  of  the  logarithm  is  a  tunable  parameter  that  appeal's  as  a  multiplicative 
factor  in  the  quality  of  the  trapdoor;  here  we  fix  upon  base  2  for  concreteness.)  In  addition,  we  give  the  first 
known  computationally  pseudorandom  construction  (under  the  LWE  assumption),  where  the  dimension  can 
be  as  small  as  m  =  n(l  +  lg  q),  although  at  the  cost  of  an  Q(^/n)  factor  worse  quality  s. 

Our  construction  also  greatly  improves  the  quality  s  of  the  trapdoor.  The  best  prior  construction  [AP09] 
produces  a  basis  whose  Gram-Schmidt  quality  (i.e.,  the  maximum  length  of  its  Gram-Schmidt  orthogonalized 
vectors)  was  loosely  bounded  by  20 y/n  lg  q.  However,  the  Gram-Schmidt  notion  of  quality  is  useful  only 
for  less  efficient,  sequential  inversion  algorithms  [Bab85,  GPV08]  that  use  high-precision  real  arithmetic. 
For  the  more  efficient,  parallel  preimage  sampling  algorithm  of  [PeilO]  that  uses  small-integer  arithmetic, 
the  parameters  guaranteed  by  [AP09]  are  asymptotically  worse,  at  m  >  nig2  q  and  s  >  16 \J n  lg2  q.  By 
contrast,  our  (statistically  secure)  trapdoor  construction  achieves  the  “best  of  both  worlds:”  asymptotically 
optimal  dimension  m  ~  2n  lg  q  and  quality  s  ~  1 .6y/n  lg  q  or  better,  with  a  parallel  preimage  sampling 
algorithm  that  is  slightly  more  efficient  than  the  one  of  [PeilO], 

Altogether,  for  any  n  and  typical  values  of  q  >  216,  we  conservatively  estimate  that  the  new  trapdoor 
generator  and  inversion  algorithms  collectively  provide  at  least  a  71gg  >  112 -fold  improvement  in  the 
length  bound  f3  ~  syTn  for  /a  preimages  (generated  using  an  efficient  algorithm).  We  also  obtain  similar 
improvements  in  the  size  of  the  error  terms  that  can  be  handled  when  efficiently  inverting  g\- 

New,  smaller  trapdoors.  As  an  additional  benefit,  our  construction  actually  produces  a  new  kind  of 
trapdoor  —  not  a  basis  —  that  is  at  least  4  times  smaller  in  storage  than  a  basis  of  corresponding  quality, 
and  is  at  least  as  powerful,  i.e.,  a  good  basis  can  be  efficiently  derived  from  the  new  trapdoor.  We  stress  that 
our  specialized  inversion  algorithms  using  the  new  trapdoor  provide  almost  exactly  the  same  quality  as  the 
inefficient,  sequential  algorithms  using  a  derived  basis,  so  there  is  no  trade-off  between  efficiency  and  quality. 
(This  is  in  contrast  with  [PeilO]  when  using  a  basis  generated  according  to  [AP09].)  Moreover,  the  storage 
size  of  the  new  trapdoor  grows  only  linearly  in  the  lattice  dimension  m,  rather  than  quadratically  as  a  basis 
does.  This  is  most  significant  for  applications  like  hierarchical  ID-based  encryption  [CHKP10,  ABB  10a] 
that  delegate  trapdoors  for  increasing  values  of  m.  The  new  trapdoor  also  admits  a  very  simple  and  efficient 
delegation  mechanism,  which  unlike  the  prior  method  [CHKP10]  does  not  require  any  costly  operations  like 
linear  independence  tests,  or  conversions  from  a  full-rank  set  of  lattice  vectors  into  a  basis.  In  summary, 
the  new  type  of  trapdoor  and  its  associated  algorithms  are  strictly  preferable  to  a  short  basis  in  terms  of 
algorithmic  efficiency,  output  quality,  and  storage  size  (simultaneously). 

Ring-based  constructions.  Finally,  and  most  importantly  for  practice,  all  of  the  above-described  construc¬ 
tions  and  algorithms  extend  immediately  to  the  ring  setting,  where  functions  analogous  to  /a  and  qa  require 
only  quasi-linear  O(n)  space  and  time  to  specify  and  evaluate  (respectively),  which  is  a  factor  of  Q(n) 
improvement  over  the  matrix-based  functions  defined  above.  See  the  representative  works  [Mic02,  PR06, 
LM06,  LMPR08,  LPR10]  for  more  details  on  these  functions  and  their  security  foundations. 
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[Ajt99,  AP09]  constructions 

This  work  (fast  fA  1 ) 

Factor  Improvement 

Dimension  m 

slow  fAl  [KleOO,  GPV08]:  >  5?rlgq 
fast  f^1  [Pei  10]:  >  nig2  q 

2n\gq  («) 

n(i  +  lg q)  («) 

2.5  -  lg  q 

Quality  s 

slow  fA  1 :  «  20y/nlgq 
fast  /ar  «  16 \/ n  lg2  q 

~  1.6Vnlgq  («) 

12.5-  10-v/Ig q 

Length  (3  ~  Sy/m 

slow  /Ax:  >  45n  lg  q 
fast  fAx:  >  16nlg2  q 

~  2.3  n  lg  q  («) 

19  -  7  lg  q 

Figure  1 :  Summary  of  parameters  for  our  constructions  and  algorithms  versus  prior  ones.  In  the  column 

SC 

labelled  “this  work,”  «  and  ^  denote  constructions  producing  public  keys  A  that  are  statistically  close  to 
uniform,  and  computationally  pseudorandom,  respectively.  (All  quality  terms  s  and  length  bounds  B  omit  the 
same  statistical  “smoothing”  factor  for  Z,  which  is  about  4-5  in  practice.) 


To  illustrate  the  kinds  of  concrete  improvements  that  our  methods  provide,  in  Figure  2  we  give  rep¬ 
resentative  parameters  for  the  canonical  application  of  GPV  sigantures  [GPV08],  comparing  the  old  and 
new  trapdoor  constructions  for  nearly  equal  levels  of  concrete  security.  We  stress  that  these  parameters  are 
not  highly  optimized,  and  making  adjustments  to  some  of  the  tunable  parameters  in  our  constructions  may 
provide  better  combinations  of  efficiency  and  concrete  security.  We  leave  this  effort  for  future  work. 

1.2  Techniques 

The  main  idea  behind  our  new  method  of  trapdoor  generation  is  as  follows.  Instead  of  building  a  random 
matrix  A  through  some  specialized  and  complex  process,  we  start  from  a  carefully  crafted  public  matrix  G 
(and  its  associated  lattice),  for  which  the  associated  functions  /g  and  ga  admit  very  efficient  (in  both 
sequential  and  parallel  complexity)  and  high-quality  inversion  algorithms.  In  particular,  preimage  sampling 
for  /g  and  inversion  for  ga  can  be  performed  in  essentially  (Bin  log  n)  sequential  time,  and  can  even  be 
performed  by  n  parallel  0(logn)-time  operations  or  table  lookups.  (This  should  be  compared  with  the 
general  algorithms  for  these  tasks,  which  require  at  least  quadratic  Q(n2  log2  n)  time,  and  are  not  always 
parallelizable  for  optimal  noise  parameters.)  We  emphasize  that  G  is  not  a  cryptographic  key,  but  rather  a 
fixed  and  public  matrix  that  may  be  used  by  all  parties,  so  the  implementation  of  all  its  associated  operations 
can  be  highly  optimized,  in  both  software  and  hardware.  We  also  mention  that  the  simplest  and  most 
practically  efficient  choices  of  G  work  for  a  modulus  q  that  is  a  power  of  a  small  prime,  such  as  q  =  2k,  but  a 
crucial  search/decision  reduction  for  LWE  was  not  previously  known  for  such  q.  despite  its  obvious  practical 
utility.  In  Section  3  we  provide  a  very  general  reduction  that  covers  this  case  and  others,  and  subsumes  all  of 
the  known  (and  incomparable)  search/decision  reductions  for  LWE  [BFKL93,  Reg05,  Pei09b,  ACPS09]. 

To  generate  a  random  matrix  A  with  a  trapdoor,  we  take  two  additional  steps:  first,  we  extend  G 
into  a  semi -random  matrix  A'  =  [A  |  G],  for  uniform  A  e  Z™xm  and  sufficiently  large  m.  (As  shown 
in  [CHKP10],  inversion  of  g&t  and  preimage  sampling  for  fjy  reduce  very  efficiently  to  the  corresponding 
tasks  for  g^  and  /g-)  Finally,  we  simply  apply  to  A1  a  certain  random  unimodular  transformation  defined  by 
the  matrix  T  =  [  I  R  ] ,  for  a  random  “short”  secret  matrix  R  that  will  serve  as  the  trapdoor,  to  obtain 

A  =  A7  •  T  =  [A  |  G-AR], 
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[AP09]  with  fast  ,/A' 

This  work 

Factor  Improvement 

Sec  pa  ram  n 

436 

284 

1.5 

Modulus  q 

232 

224 

256 

Dimension  m 

446,644 

13,812 

32.3 

Quality  s 

10.7  x  103 

418 

25.6 

Length  [5 

12.9  x  106 

91.6  x  103 

141 

Key  size  (bits) 

6.22  x  109 

92.2  x  106 

67.5 

Key  size  (ring-based) 

w  16  x  106 

«  361  x  103 

«  44.3 

Figure  2:  Representative  parameters  for  GPV  signatures  (using  fast  inversion  algorithms)  for  the  old  and  new 
trapdoor  generation  methods.  Using  the  methodology  from  [MR09],  both  sets  of  parameters  have  security 
level  corresponding  to  a  parameter  <5  of  at  most  1.007,  which  is  estimated  to  require  about  246  core-years 
on  a  64-bit  1.86GFIz  Xeon  using  the  state-of-the-art  in  lattice  basis  reduction  [GN08,  CN11],  We  use  a 
smoothing  parameter  of  r  =  4.5  for  Z,  which  corresponds  to  statistical  error  of  less  than  2-90  for  each 
randomized-rounding  operation  during  signing.  Key  sizes  are  calculated  using  the  Flermite  normal  form 
optimization.  Key  sizes  for  ring-based  GPV  signatures  are  approximated  to  be  smaller  by  a  factor  of  about 
0.9n. 

The  transformation  given  by  T  has  the  following  properties: 

•  It  is  very  easy  to  compute  and  invert,  requiring  essentially  just  one  multiplication  by  R  in  both  cases. 
(Note  that  T_1  =  [  J  .) 

•  It  results  in  a  matrix  A  that  is  distributed  essentially  uniformly  at  random,  as  required  by  the  security 
reductions  (and  worst-case  hairiness  proofs)  for  lattice-based  cryptographic  schemes. 

•  For  the  resulting  functions  /a  and  preimage  sampling  and  inversion  very  simply  and  efficiently 
reduce  to  the  corresponding  tasks  for  ,/q-  Hg-  The  overhead  of  the  reduction  is  essentially  just  a  single 
matrix- vector  product  with  the  secret  matrix  R  (which,  when  inverting  /a,  can  largely  be  precomputed 
even  before  the  target  value  is  known). 

As  a  result,  the  cost  of  the  inversion  operations  ends  up  being  very  close  to  that  of  computing  /a  and  g\  in  the 
forward  direction.  Moreover,  the  fact  that  the  running  time  is  dominated  by  matrix- vector  multiplications  with 
th e  fixed  trapdoor  matrix  R  yields  theoretical  (but  asymptotically  significant)  improvements  in  the  context 
of  batch  execution  of  several  operations  relative  to  the  same  secret  key  R:  instead  of  evaluating  several 
products  Rzi,  Rz2, . . . ,  Rzn  individually  at  a  total  cost  of  G(n3),  one  can  employ  fast  matrix  multiplication 
techniques  to  evaluate  R[zi, . . . ,  zn]  as  a  whole  is  subcubic  time.  Batch  operations  can  be  exploited  in 
applications  like  the  multi-bit  IBE  of  [GPV08]  and  its  extensions  to  HIBE  [CHKP10,  ABB  10a,  ABB  10b], 

Related  techniques.  At  the  surface,  our  trapdoor  generator  appears  similar  to  the  original  “GGH”  approach 
of  [GGH97]  for  generating  a  lattice  together  with  a  short  basis.  That  technique  works  by  choosing  some 
random  short  vectors  as  the  secret  “good  basis”  of  a  lattice,  and  then  transforms  them  into  a  public  “bad  basis” 
for  the  same  lattice,  via  a  unimodular  matrix  having  large  entries.  (Note,  though,  that  this  does  not  produce 
a  lattice  from  Ajtai’s  worst-case-hard  family.)  A  closer  look  reveals,  however,  that  (worst-case  hardness 
aside)  our  method  is  actually  not  an  instance  of  the  GGH  paradigm:  here  the  initial  short  basis  of  the  lattice 
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defined  by  G  (or  the  semi -random  matrix  [A|G])  infixed  and  public,  while  the  random  unimodular  matrix 
T  =  [I  R]  actually  produces  a  new  lattice  by  applying  a  (reversible)  linear  transformation  to  the  original 
lattice.  In  other  words,  in  contrast  with  GGH  we  multiply  a  (short)  unimodular  matrix  on  the  “other  side”  of 
the  original  short  basis,  thus  changing  the  lattice  it  generates. 

A  more  appropriate  comparison  is  to  Ajtai’s  original  method  [Ajt96]  for  generating  a  random  A  together 
with  a  “weak”  trapdoor  of  one  or  more  short  lattice  vectors  (but  not  a  full  basis).  There,  one  simply  chooses  a 
semi-random  matrix  A7  =  [A  |  0]  and  outputs  A  =  A7  •  T  =  [A  |  —  AR],  with  short  vectors  [ R] .  Perhaps 
surprisingly,  our  strong  trapdoor  generator  is  just  a  simple  twist  on  Ajtai’s  original  weak  generator,  replacing 
0  with  the  gadget  G. 

Our  constructions  and  inversion  algorithms  also  draw  upon  several  other  techniques  from  throughout  the 
literature.  The  trapdoor  basis  generator  of  [AP09]  and  the  LWE-based  “lossy”  injective  trapdoor  function 
of  [PW08]  both  use  a  fixed  “gadget”  matrix  analogous  to  G,  whose  entries  grow  geometrically  in  a  structured 
way.  In  both  cases,  the  gadget  is  concealed  (either  statistically  or  computationally)  in  the  public  key  by 
a  small  combination  of  uniformly  random  vectors.  Our  method  for  adding  tags  to  the  trapdoor  is  very 
similar  to  a  technique  for  doing  the  same  with  the  lossy  TDF  of  [PW08],  and  is  identical  to  the  method  used 
in  [ABB  10a]  for  constructing  compact  (H)IBE.  Finally,  in  our  preimage  sampling  algorithm  for  /a,  we  use 
the  “convolution”  technique  from  [Pei  10]  to  correct  for  some  statistical  skew  that  arises  when  converting 
preimages  for  /g  to  preimages  for  /a,  which  would  otherwise  leak  information  about  the  trapdoor  R. 

1.3  Applications 

Our  improved  trapdoor  generator  and  inversion  algorithms  can  be  plugged  into  any  scheme  that  uses  such  tools 
as  a  “black  box,”  and  the  resulting  scheme  will  inherit  all  the  efficiency  improvements.  (Every  application 
we  know  of  admits  such  a  black-box  replacement.)  Moreover,  the  special  properties  of  our  methods  allow 
for  further  improvements  to  the  design,  efficiency,  and  security  reductions  of  existing  schemes.  Here  we 
summarize  some  representative  improvements  that  arc  possible  to  obtain;  see  Section  6  for  complete  details. 

Hash-and-sign  digital  signatures.  Our  construction  and  supporting  algorithms  plug  directly  into  the  “full 
domain  hash”  signature  scheme  of  [GPV08],  which  is  strongly  unforgeable  in  the  random  oracle  model,  with 
a  tight  security  reduction.  One  can  even  use  our  computationally  secure  trapdoor  generator  to  obtain  a  smaller 
public  verification  key,  though  at  the  cost  of  a  hardness-of-LWE  assumption,  and  a  somewhat  stronger  SIS 
assumption  (which  affects  concrete  security).  Determining  the  right  balance  between  key  size  and  security  is 
left  for  later  work. 

In  the  standard  model,  there  arc  two  closely  related  types  of  hash-and-sign  signature  schemes: 

•  The  one  of  [CHKP10],  which  has  signatures  of  bit  length  0(n2),  and  is  existentially  unforgeable  (later 
improved  to  be  strongly  unforgeable  [Rue  10])  assuming  the  hardness  of  inverting  /a  with  solution 
length  bounded  by  /3  =  0(n1,5).2 

•  The  scheme  of  [BoylO],  a  lattice  analogue  of  the  pairing-based  signature  of  [Wat05],  which  has 
signatures  of  bit  length  O(n)  and  is  existentially  unforgeable  assuming  the  hardness  of  inverting  /a 
with  solution  length  bounded  by  (3  =  0(n3'5). 

We  improve  the  latter  scheme  in  several  ways,  by:  (i)  improving  the  length  bound  to  (3  =  0 (n2  5 ) ;  ( ii )  reducing 
the  online  runtime  of  the  signing  algorithm  from  0(n3)  to  0(n2)  via  chameleon  hashing  [KR00];  (iii)  making 
the  scheme  strongly  unforgeable  a  la  [GPV08,  Rile  10];  (iv)  giving  a  tighter  and  simpler  security  reduction 

2A11  parameters  in  this  discussion  assume  a  message  length  of  0(n)  bits. 
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(using  a  variant  of  the  “prefix  technique”  [HW09]  as  in  [CHKP10]),  where  the  reduction’s  advantage  degrades 
only  linearly  in  the  number  of  signature  queries;  and  (v)  removing  all  additional  constraints  on  the  parameters 
n  and  q  (aside  from  those  needed  to  ensure  hardness  of  the  SIS  problem).  We  stress  that  the  scheme  itself 
is  essentially  the  same  (up  to  the  improved  and  generalized  parameters,  and  chameleon  hashing)  as  that 
of  [BoylO];  only  the  security  proof  and  underlying  assumption  are  improved.  Note  that  in  comparison 
with  [CHKP10],  there  is  still  a  trade-off  between  the  bit  length  of  the  signatures  and  the  bound  /?  in  the 
underlying  SIS  assumption;  this  appears  to  be  inherent  to  the  style  of  the  security  reduction.  Note  also  that  the 
public  keys  in  all  of  these  schemes  are  still  rather  large  at  0(n3)  bits  (or  0(n2)  bits  using  the  ring  analogue 
of  SIS),  so  they  arc  still  mainly  of  theoretical  interest.  Improving  the  key  sizes  of  standard-model  signatures 
is  an  important  open  problem. 

Chosen  ciphertext-secure  encryption.  We  give  a  new  construction  of  CCA-secure  public -key  encryption  (in  the 
standard  model)  from  the  learning  with  errors  (LWE)  problem  with  error  rate  a  =  1/ poly(n),  where  larger  a 
corresponds  to  a  harder  concrete  problem.  Existing  schemes  exhibit  various  incomparable  tradeoffs  between 
key  size  and  error  rate.  The  first  such  scheme  is  due  to  [PW08]:  it  has  public  keys  of  size  0(n2)  bits  (with 
somewhat  large  hidden  factors)  and  relies  on  a  quite  small  LWE  error  rate  of  a  =  0(1/ n4).  The  next  scheme, 
from  [Pei09b],  has  larger  public  keys  of  0(n 3)  bits,  but  uses  a  better  error  rate  of  a  =  0(l/n).  Finally,  using 
the  generic  conversion  from  selectively  secure  ID-based  encryption  to  CCA-secure  encryption  [BCHK07], 
one  can  obtain  from  [ABBlOa]  a  scheme  having  key  size  0(n2)  bits  and  using  error  rate  a  =  0 ( 1  /n2 ) . 
(Here  decryption  is  randomized,  since  the  IBE  key-derivation  algorithm  is.)  In  particular,  the  public  key  of 
the  scheme  from  [ABB  10b]  consists  of  3  matrices  in  Z”xm  where  m  is  large  enough  to  embed  a  (strong) 
trapdoor,  plus  essentially  one  vector  in  Z”  per  message  bit. 

We  give  a  CCA-secure  system  that  enjoys  the  best  of  all  prior  constructions,  which  has  0(n2)-bit  public 
keys,  uses  error  rate  a  =  0(1  /n)  (both  with  small  hidden  factors),  and  has  deterministic  decryption.  To 
achieve  this,  we  need  to  go  beyond  just  plugging  our  improved  trapdoor  generator  as  a  black  box  into  prior 
constructions.  Our  scheme  relies  on  the  particular  structure  of  the  trapdoor  instances;  in  effect,  we  directly 
construct  a  “tag-based  adaptive  trapdoor  function”  [KMO10].  The  public  key  consists  of  only  1  matrix  with 
an  embedded  (strong)  trapdoor,  rather  than  3  as  in  the  most  compact  scheme  to  date  [ABBlOa];  moreover, 
we  can  encrypt  up  to  n  log  q  message  bits  per  ciphertext  without  needing  any  additional  public  key  material. 
Combining  these  design  changes  with  the  improved  dimension  of  our  trapdoor  generator,  we  obtain  more  than 
a  7.5-fold  improvement  in  the  public  key  size  as  compared  with  [ABBlOa].  (This  figure  does  not  account  for 
removing  the  extra  public  key  material  for  the  message  bits,  nor  the  other  parameter  improvements  implied 
by  our  weaker  concrete  LWE  assumption,  which  would  shrink  the  keys  even  further.) 

(Hierarchical)  identity-based  encryption.  Just  as  with  signatures,  our  constructions  plug  directly  into  the 
random-oracle  IBE  of  [GPV08].  In  the  standard-model  depth-d  hierarchical  IBEs  of  [CHKP10,  ABBlOa], 
our  techniques  can  shrink  the  public  parameters  by  an  additional  factor  of  about  2  l(4/j  e  [3, 4],  relative  to 
just  plugging  our  improved  trapdoor  generator  as  a  “black  box”  into  the  schemes.  This  is  because  for  each 
level  of  the  hierarchy,  the  public  parameters  only  need  to  contain  one  matrix  of  the  same  dimension  as  G 
(i.e.,  about  n  lg  q),  rather  than  two  full  trapdoor  matrices  (of  dimension  about  2 n  lg  q  each).3  Because  the 
adaptation  is  straightforward  given  the  tools  developed  in  this  work,  we  omit  the  details. 

3We  note  that  in  [Pei09a]  (an  earlier  version  of  [CHKP10])  the  schemes  are  defined  in  a  similar  way  using  lower-dimensional 
extensions,  rather  than  full  trapdoor  matrices  at  each  level. 
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1.4  Other  Related  Work 


Concrete  parameter  settings  for  a  variety  “strong”  trapdoor  applications  are  given  in  [RS 10].  Those  parameters 
are  derived  using  the  previous  suboptimal  generator  of  [AP09],  and  using  the  methods  from  this  work  would 
yield  substantial  improvements.  The  recent  work  of  [LP 11]  also  gives  improved  key  sizes  and  concrete 
security  for  LWE-based  cryptosystems;  however,  that  work  deals  only  with  IND-CPA-secure  encryption, 
and  not  at  all  with  strong  trapdoors  or  the  further  applications  they  enable  (CCA  security,  digital  signatures, 
(H)IBE,  etc.). 

2  Preliminaries 

We  denote  the  real  numbers  by  M  and  the  integers  by  Z.  For  a  nonnegative  integer  k,  we  let  \k]  —  {1 .... ,  k } . 
Vectors  arc  denoted  by  lower-case  bold  letters  (e.g.,  x)  and  arc  always  in  column  form  (x;  is  a  row  vector). 
We  denote  matrices  by  upper-case  bold  letters,  and  treat  a  matrix  X  interchangeably  with  its  ordered  set 
{xi,  X2, . . .}  of  column  vectors.  For  convenience,  we  sometimes  use  a  scalar  s  to  refer  to  the  scaled  identity 
matrix  si,  where  the  dimension  will  be  clear  from  context. 

The  statistical  distance  between  two  distributions  X,  Y  over  a  finite  or  countable  domain  D  is  A(X,  Y )  = 
\  d\X (w)  —  V(u;)|.  Statistical  distance  is  a  metric,  and  in  particular  obeys  the  triangle  inequality.  We 
say  that  a  distribution  over  D  is  e-uniform  if  its  statistical  distance  from  the  uniform  distribution  is  at  most  e. 

Throughout  the  paper,  we  use  a  “randomized-rounding  parameter”  r  that  we  let  be  a  fixed  function 
r(n)  =  oj ( v/log n)  growing  asymptotically  faster  than  \f\ogn.  By  “fixed  function”  we  mean  that  r  = 
cu(\/log n)  always  refers  to  the  very  same  function,  and  no  other  factors  will  be  absorbed  into  the  w(-) 
notation.  This  allows  us  to  keep  track  of  the  precise  multiplicative  constants  introduced  by  our  constructions. 
Concretely,  we  take  r  ~  y/ln(2/e)/7r  where  e  is  a  desired  bound  on  the  statistical  error  introduced  by 
each  randomized-rounding  operation  for  Z,  because  the  error  is  bounded  by  ss  2  exp(— nr2)  according  to 
Fenima  2.3  below.  For  example,  for  e  =  2-54  we  have  r  <  3.5,  and  for  e  =  2-'1  we  have  r  <  4. 

2.1  Linear  Algebra 

A  unimodular  matrix  U  £  Zmxm  is  one  for  which  |det(U)|  =  1;  in  particular,  U_1  £  Z mxm  as  wep  The 
Gram-Schmidt  ortho gonalization  of  an  ordered  set  of  vectors  V  =  {vi, . . . ,  v/,.  }  £  Mn,  is  V  =  {vy, . . . ,  v/. } 
where  v,  is  the  component  of  v,  orthogonal  to  span(vi, . . . ,  v,_i)  for  alii  =  1, ...  ,k.  (In  some  cases  we 
orthogonalize  the  vectors  in  a  different  order.)  In  matrix  form,  V  =  QDU  for  some  orthogonal  Q  £  Wnxk, 
diagonal  D  £  B.kxk  with  nonnegative  entries,  and  upper  unitriangular  U  £  Mfexfc  (i.e.,  U  is  upper  triangular 
with  Is  on  the  diagonal).  The  decomposition  is  unique  when  the  Vj  arc  linearly  independent,  and  we  always 
have  ||vj||  =  the  zth  diagonal  entry  of  D. 

For  any  basis  V  =  {vi, . . . ,  vn}  of  Mn,  its  origin-centered  parallelepiped  is  defined  as  V1/2 (V)  = 
V  •  [— g,  \)n.  Its  dual  basis  is  defined  as  V*  =  V_t  =  (  V  'j'.  If  we  orthogonalize  V  and  V*  in  forward 
and  reverse  order,  respectively,  then  we  have  v*  =  Vj  / 1 1  Vj  1 1 2  for  all  i.  In  particular,  1 1  v*  1 1  =  1/ 1|  v*  || . 

For  any  square  real  matrix  X,  the  (Moore-Penrose)  pseudoinverse,  denoted  X+,  is  the  unique  matrix 
satisfying  (XX+)X  =  X,  X+(XX+)  =  X+,  and  such  that  both  XX+  and  X+X  ai’e  symmetric.  We 
always  have  span(X)  =  span(X+),  and  when  X  is  invertible,  we  have  X+  =  X  ~ 1 . 

A  symmetric  matrix  S  £  Mnxn  is  positive  definite  (respectively,  positive  .vc/n/dcfinitc),  written  S  >  0 
(resp.,  S  >  0),  if  x*Sx  >  0  (resp.,  xTx  >  0)  for  all  nonzero  x  £  Mn.  We  have  S  >  0  if  and  only  if  S 
is  invertible  and  E-1  >  0,  and  E  >  0  if  and  only  if  E+  >  0.  Positive  (semi)definiteness  defines  a  partial 
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ordering  on  symmetric  matrices:  we  say  that  Si  >  S2  if  (Si  —  S2)  >  0,  and  similarly  for  Si  >  S2.  We 
have  Si  >  S2  >  0  if  and  only  if  Z.J  >  S^  >  0,  and  likewise  for  the  analogous  strict  inequalities. 

For  any  matrix  B,  the  symmetric  matrix  S  =  BB;  is  positive  semidefinite,  because 

x*Sx  =  (B*x,Bfx)  =  ||B*x||2  >  0 

for  any  nonzero  x  G  Mn,  where  the  inequality  is  always  strict  if  and  only  if  B  is  nonsingular.  We  say  that 
B  is  a  square  root  of  S  >  0,  written  B  =  \/S,  if  BB*  =  S.  Every  S  >  0  has  a  square  root,  which  can  be 
computed  efficiently,  e.g.,  via  the  Cholesky  decomposition. 

For  any  matrix  B  G  Mnxfc,  there  exists  a  singular  value  decomposition  B  =  QDP\  where  Q  G  Mnxn, 
P  G  R.kxk  are  orthogonal  matrices,  and  D  G  M.nxk  is  a  diagonal  matrix  with  nonnegative  entries  >  0  on 
the  diagonal,  in  non-increasing  order.  The  Si  are  called  the  singular  values  of  B.  Under  this  convention,  D  is 
uniquely  determined  (though  Q,  P  may  not  be),  and  si(B)  =  maxu||Bu||  =  maxu||Bfu||  >  ||B||,  || B* || , 
where  the  maxima  are  taken  over  all  unit  vectors  u  G  Mfc. 

2.2  Lattices  and  Hard  Problems 

Generally  defined,  an  m-dimensional  lattice  A  is  a  discrete  additive  subgroup  of  Mm.  For  some  k  <  m,  called 
the  rank  of  the  lattice,  A  is  generated  as  the  set  of  all  Z-linear  combinations  of  some  k  linearly  independent 
basis  vectors  B  =  {bi, . . . ,  b*.},  i.e.,  A  =  {Bz  :  z  G  Zfc}.  In  this  work,  we  are  mostly  concerned  with 
full-rank  integer  lattices,  i.e.,  A  C  Zm  with  k  =  m.  (We  work  with  non-full-rank  lattices  only  in  the  analysis 
of  our  Gaussian  sampling  algorithm  in  Section  5.4.)  The  dual  lattice  A*  is  the  set  of  all  v  G  span(A)  such 
that  (v,  x)  G  Z  for  every  x  G  A.  If  B  is  a  basis  of  A,  then  B*  =  B(B*B)  1  is  a  basis  of  A*.  Note  that  when 
A  is  full-rank,  B  is  invertible  and  hence  B*  =  B_/. 

Many  cryptographic  applications  use  a  particular  family  of  so-called  q-ary  integer  lattices,  which  contain 
qLm  as  a  sublattice  for  some  (typically  small)  integer  q.  For  positive  integers  n  and  q,  let  A  G  Z"xm  be 
arbitrary  and  define  the  following  full -rank  m-dimensional  q-ary  lattices: 

AJ_(A)  =  {z  G  Zm  :  Az  =  0  mod  q} 

A(At)  =  {z  G  Zm  :  3  s  G  Z”  s.t.  z  =  Afs  mod  q}. 

It  is  easy  to  check  that  AJ-(A)  and  A(A4)  are  dual  lattices,  up  to  a  q  scaling  factor:  q  ■  A~L(A)*  =  A(A*), 
and  vice-versa.  For  this  reason,  it  is  sometimes  more  natural  to  consider  the  non-integral,  “1-ary”  lattice 
f  A(A*)  =  A2- (A)*  D  Zm.  For  any  u  G  Z”  admitting  an  integral  solution  to  Ax  =  u  mod  q,  define  the 
coset  (or  “shifted”  lattice) 

A^; (A)  =  {zG  Zm  :  Az  =  u  mod  q}  =  A_L(A)  +  x. 

Here  we  recall  some  basic  facts  about  these  g-ary  lattices. 

Lemma  2.1.  Let  A  G  Z”xm  be  arbitrary  and  let  S  G  Zmxm  be  any  basis  o/A^(A). 

1.  For  any  unimodular  T  G  Zmxm,  we  have  T  •  A^(A)  =  AJ-(A  •  T_1),  with  T  •  S  as  a  basis. 

2.  [ABBlOa,  implicit]  For  any  invertible  H  G  Zg  xn,  we  have  A±(H  •  A)  =  AJ-(A). 

3.  [  CHRP  10,  Lemma  3.2]  Suppose  that  the  columns  of  A  generate  all  of  Z”,  let  A'  G  Z” xm'  be  arbitrary, 
and  let  W  G  Zmxm  be  an  arbitrary  solution  to  AW  =  —  A'  mod  q.  Then  S'  =  [w  s]  is  a  basis  of 
AJ“([A/  |  A]),  and  when  ortho gonalized  in  appropriate  order,  S'  =  [  J  -  ].  In  particular;  ||S'||  =  ||S||. 
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Cryptographic  problems.  For  (3  >  0,  the  short  integer  solution  problem  SIS?]/g  is  an  average-case  version 
of  the  approximate  shortest  vector  problem  on  A_L(A).  The  problem  is:  given  uniformly  random  A  G  Z”xr" 
for  any  desired  m  =  poly(n),  find  a  relatively  short  nonzero  z  G  A-1  (A),  i.e.,  output  a  nonzero  z  G  Zm  such 
that  Az  =  0  mod  q  and  ||z||  <  f3.  When  q  >  (3^/n-u(y/\ogn),  solving  this  problem  (with  any  non-negligible 
probability  over  the  random  choice  of  A)  is  at  least  as  hard  as  (probabilistically)  approximating  the  Shortest 
Independent  Vectors  Problem  (SIVP,  a  classic  problem  in  the  computational  study  of  point  lattices  [MG02]) 
on  n-dimensional  lattices  to  within  0(ft\Jn)  factors  in  the  worst  case  [Ajt96,  MR04,  GPV08]. 

For  a  >  0,  the  learning  with  errors  problem  LWEg  Q  may  be  seen  an  average-case  version  of  the 
bounded-distance  decoding  problem  on  the  dual  lattice  ^A(A4).  Let  T  =  M/Z,  the  additive  group  of 
reals  modulo  1,  and  let  Da  denote  the  Gaussian  probability  distribution  over  M  with  parameter  a  (see 
Section  2.3  below).  For  any  fixed  s  G  Z”,  define  ,4S,0  to  be  the  distribution  over  Z”  x  T  obtained  by 
choosing  a  Z”  uniformly  at  random,  choosing  e  -t—  Da,  and  outputting  (a,  b  =  (a,  s )/q  +  e  mod  1). 
The  search- LWEg)Q  problem  is:  given  any  desired  number  m  =  poly(n)  of  independent  samples  from  As,a 
for  some  arbitrary  s,  find  s.  The  decision- LWEg  cv  problem  is  to  distinguish,  with  non-negligible  advantage, 
between  samples  from  As  a  for  uniformly  random  sGZJ,  and  uniformly  random  samples  from  Z”  x  T. 
There  are  a  variety  of  (incomparable)  search/decision  reductions  for  LWE  under  certain  conditions  on  the 
parameters  (e.g.,  [Reg05,  Pei09b,  ACPS09]);  in  Section  3  we  give  a  reduction  that  essentially  subsumes 
them  all.  When  q  >  2^/n/a,  solving  search-LWE,j)Q  is  at  least  as  hard  as  quantumly  approximating  SIVP 
on  n-dimensional  lattices  to  within  0{n/a)  factors  in  the  worst  case  [Reg05].  For  a  restricted  range  of 
parameters  (e.g.,  when  q  is  exponentially  large)  a  classical  (non-quantum)  reduction  is  also  known  [Pei09b], 
but  only  from  a  potentially  easier  class  of  problems  like  the  decisional  Shortest  Vector  Problem  (GapSVP) 
and  the  Bounded  Distance  Decoding  Problem  (BDD)  (see  [LM09]). 

Note  that  the  m  samples  (a,;,  b, )  and  underlying  error  terms  e,  from  ASjCC  may  be  grouped  into  a  matrix 
A  G  Z”xm  and  vectors  b  G  Tm,  e  G  Km  in  the  natural  way,  so  that  b  =  (Ats)/q  +  e  mod  1.  In  this  way,  b 
may  be  seen  as  an  element  of  AJ-(A)*  =  -A(A/  ).  perturbed  by  Gaussian  error.  By  scaling  b  and  discretizing 
its  entries  using  a  form  of  randomized  rounding  (see  [PeilO]),  we  can  convert  it  into  Ik  =  A*s  +  e'  mod  q 
where  e'  G  Zm  has  discrete  Gaussian  distribution  with  parameter  (say)  \f2aq. 

2.3  Gaussians  and  Lattices 

The  n-dimensional  Gaussian  function  p  :  Mn  — >  (0,1]  is  defined  as 

p(x)  =  exp(  — 7T  •  |jx||2)  =  exp(  — 7T  •  (x,  x)). 

Applying  a  linear  transformation  given  by  a  (not  necessarily  square)  matrix  B  with  linearly  independent 
columns  yields  the  (possibly  degenerate)  Gaussian  function 

A  I  p(B+x)  =  exp  (—7 r  •  x*S+x)  if  x  G  span(B)  =  span(E) 
rm(X)  =  \o  otherwise 

where  S  =  BB*  >  0.  Because  pg  is  distinguished  only  up  to  E,  we  usually  refer  to  it  as 

Normalizing  by  its  total  measure  over  span(E),  we  obtain  the  probability  distribution  function  of 
the  (continuous)  Gaussian  distribution  D^.  By  linearity  of  expectation,  this  distribution  has  covariance 
Ex^Dy^fx  - x*]  =  jz.  (The  3-  factor  is  the  variance  of  the  Gaussian  I) \ .  due  to  our  choice  of  normalization.) 
For  convenience,  we  implicitly  ignore  the  ^  factor,  and  refer  to  E  as  the  covariance  matrix  of  D^. 
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Let  A  C  R"  be  a  lattice,  let  c  £  Mn,  and  let  S  >  0  be  a  positive  semidefinite  matrix  such  that 
(A  +  c)  n  span(S)  is  nonempty.  The  discrete  Gaussian  distribution  DA+c  ^  is  simply  the  Gaussian 
distribution  D restricted  to  have  support  A  +  c.  That  is,  for  all  x  £  A  +  c, 

n  ,  a  ^Vs(X)  ,  \ 

=  ^(A  +  e)  « 

We  recall  the  definition  of  the  smoothing  parameter  from  [MR04],  generalized  to  non-spherical  (and 
potentially  degenerate)  Gaussians.  It  is  easy  to  see  that  the  definition  is  consistent  with  the  partial  ordering  of 
positive  semidefinite  matrices,  i.e.,  if  XA  P.  E2  / y f  ( A  ) .  then  Si  >  r)e( A). 

Definition  2.2.  Let  S  >  0  and  A  C  span(E)  be  a  lattice.  We  say  that  a/S  >  r/e( A)  if  r(A*)  <  1  +  e. 

The  following  is  a  bound  on  the  smoothing  parameter  in  terms  of  any  orthogonalized  basis.  Note  that  for 
practical  choices  like  n  <  214  and  e  >  2-80,  the  multiplicative  factor  attached  to  ||B||  is  bounded  by  4.6. 

Lemma  2.3  ([GPV08,  Theorem  3.1]).  Let  A  C  Mn  be  a  lattice  with  basis  B,  and  let  e  >  0  .We  have 

r?e(A)  <  || B ||  •  011(211(1  +  l/e))/7r. 

In  particular,  for  any  u;(  v7  log  n)  function,  there  is  a  negligible  e(n)for  which  r/e(A)  <  ||B||  •  log  n). 

For  appropriate  parameters,  the  smoothing  parameter  of  a  random  lattice  A_L(A)  is  small,  with  very  high 
probability.  The  following  bound  is  a  refinement  and  strengthening  of  one  from  [GPV08],  which  allows  for  a 
more  precise  analysis  of  the  parameters  and  statistical  errors  involved  in  our  constructions. 

Lemma  2.4.  Let  n,  m,  q  >  2  be  positive  integers.  For  s  £  Z”,  let  the  subgroup  Gs  =  {(a,  s)  :  a  £  Z”}  C 
Zq,  and  let  gs  =  |Os|  =  q/  gcd(si, . . . ,  sn,  q ).  Let  e  >  0,  r]  >  r]e( Zm),  and  s  >  p  be  reals.  Then  for 
uniformly  random  A  £  Z”xm, 


E 

A  L' 


Pi/s(A±(A)*)  <  (1  +  e)  max{l /gs,rj/s}m. 

seZ" 


(2.1) 


In  particular,  if  q  =  pe  is  a  power  of  a  prime  p,  and 

log(3  +  2/e)  n  log  q  +  log(2  +  2/e) 


m  >  max  <  n  + 


logp 


log {s/p) 


(2.2) 


then  Ea  [pi/s(AJ-(A)*)]  <  l+2e,  and  so  by  Markov’s  inequality,  s  >  r]2e/8(^-±  (A))  except  with  probability 
at  most  5. 


Proof.  We  will  use  the  fact  (which  follows  from  the  Poisson  summation  formula;  see  [MR04,  Lemma  2.8]) 
that  pt (A)  <  pr( A)  <  ( r/t)m  ■  pt{A)  for  any  rank-m  lattice  A  and  r  >  t  >  0. 

For  any  A  £  Z”xm,  one  can  check  that  A4- (A)*  =  Zm  +  {Ats/q  :  s  £  Z”}.  Note  that  Afs  is  uniformly 
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random  over  G”\  for  uniformly  random  A.  Then  we  have 


E 

A  L' 


^(A^A)*)]  <  ^E^r  +  As/?)] 


seZ" 


=  E  '  ?i/«(9."‘  - z’”) 

seZ" 


<  ^2  9sm-  max{l, gsr]/ s}m  •  p1/v( Zm), 

sGZ" 

<  (!  +  e)  X!  max{l/ 9s,  v/ s},7\ 

s£Z™ 


(lin.  of  E) 
(avg.  over  A) 
(above  fact) 

(V  >  9e(%m))- 


To  prove  the  second  paid  of  the  claim,  observe  that  gs  =  jf  for  some  i  >  0,  and  that  there  arc  at  most  g" 
values  of  s  for  which  gs  =  g,  because  each  entry  of  s  must  be  in  Gs.  Therefore, 


E  V5Sm  <  J2pi{n~m) 

seZ ™  i> 0 


1 


1  —  p 


n—m 


<  i  + 


e 

2(1 +  e)' 


(More  generally,  for  arbitrary  q  we  have  1  / 9s'  A  C(m  —  n),  where  £(•)  is  the  Riemann  zeta  function.) 
Similarly,  J2s{r]/s)m  =  qn(s/g)~m  <  2(i+e)  ’  an(l  ’■l16  claim  follows.  □ 

We  need  a  number  of  standard  facts  about  discrete  Gaussians. 


Lemma  2.5  ([MR04,  Lemmas  2.9  and  4.1])*  Let  A.  d  IR^  be  cl  lattice.  For  any  ^2  ^  0  and  c  £  IR^, 
we  have  p^(A  +  c)  <  p^(A).  Moreover,  if  v/E  >  rje(A )  for  some  e  >  0  and  c  e  span(A),  then 

PVz(A  +  c)  >  TTi  ’  Pf e(A)- 

Combining  the  above  lemma  with  a  bound  of  Banaszczyk  [Ban93],  we  have  the  following  tail  bound  on 
discrete  Gaussians. 

Lemma  2.6  ([Ban93,  Lemma  1.5]).  Let  Ac  W1  be  a  lattice  and  r  >  ?ye(A )  for  some  e  <E  (0, 1).  For  any 
c  €  span(A),  we  have 

Pr  JJ|DA+C,r||  >  ry/n\  <  2~n  ■  l^f. 

Moreover,  if  c  =  0  then  the  bound  holds  for  any  r  >  0,  with  e  =  0. 

The  next  lemma  bounds  the  predictability  (i.e.,  probability  of  the  most  likely  outcome  or  equivalently, 
min-entropy)  of  a  discrete  Gaussian. 

Lemma  2.7  ([PR06,  Lemma  2. 11]).  Let  A  C  Mn  be  a  lattice  and  r  >  2r/e(A)  for  some  e  G  (0,  f).  For  any 
c  G  Mn  and  any  y  £  A  +  c ,  we  have  Pr[Z?A_|_Cjr  =  y]  <  2~n  ■ 


2.4  Subgaussian  Distributions  and  Random  Matrices 

For  S  >  0,  we  say  that  a  random  variable  X  (or  its  distribution)  over  M  is  5-subgaussian  with  parameter 
s  >  0  if  for  all  t  £  M,  the  (scaled)  moment-generating  function  satisfies 

E  [exp(27rLY)]  <  exp(<5)  •  exp(7rs2t2). 
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Notice  that  the  exp(7rs2t2)  term  on  the  right  is  precisely  the  (scaled)  moment-generating  function  of  the 
Gaussian  distribution  Ds.  So,  our  definition  differs  from  the  usual  definition  of  subgaussian  only  in  the 
additional  factor  of  exp(<5);  we  need  this  relaxation  when  working  with  discrete  Gaussians,  usually  taking 
5  =  ln(j^)  ~  2e  for  the  same  small  e  as  in  the  smoothing  parameter  rje. 

If  X  is  (i-subgaussian,  then  its  tails  are  dominated  by  a  Gaussian  of  parameter  s,  i.e.,  Pr  [|X|  >  t]  < 
2exp(e>)  exp(— nt2/s2)  for  all  t  >  0.4  This  follows  by  Markov’s  inequality:  by  scaling  X  we  can  assume 
s  =  1,  and  we  have 

Pr[X  >  t]  =  Pr[exp(27rLY)  >  exp(27rf2)]  <  exp(<5)  exp(7rf2)/ exp(27rt2)  =  exp(d)  exp(— 7rf2). 

The  claim  follows  by  repeating  the  argument  with  — X ,  and  the  union  bound.  Using  the  Taylor  series 
expansion  of  exp(27rfX),  it  can  be  shown  that  any  /i-bou ndcd  symmetric  random  variable  X  (i.e.,  X  <  B 
always)  is  O-subgaussian  with  parameter  B\^2n. 

More  generally,  we  say  that  a  random  vector  x  or  its  distribution  (respectively,  a  random  matrix  X)  is  6- 
subgaussian  (of  parameter  s)  if  all  its  one-dimensional  marginals  (u,  v)  (respectively,  ifXv)  for  unit  vectors 
u,  v  are  d-subgaussian  (of  parameter  s ).  It  follows  immediately  from  the  definition  that  the  concatenation  of 
independent  <5j -subgaussian  vectors  with  common  parameter  s,  interpreted  as  either  a  vector  or  matrix,  is 
(]U  r)?)-subgaussian  with  parameter  s. 

Lemma  2.8.  Let  A  C  Mn  be  a  lattice  and  s  >  rje(A)for  some  0  <  e  <  1.  For  any  c  £  span(A),  Da+c,.s  is 
In ( ) -subgaussian  with  parameter  s.  Moreover,  it  is  O-subgaussian  for  any  s  >  0  when  c  =  0. 

Proof.  By  scaling  A  we  can  assume  that  s  =  1.  Let  x  have  distribution  D\+c,  and  let  u  £  Mn  be  any  unit 
vector.  We  bound  the  scaled  moment-generating  function  of  the  marginal  (x,  u)  for  any  t  £  M: 

p(A  +  c)  •  E  [exp(27r(x,  tu))]  =  X  exp(— vr((x,x)  -  2(x,fu))) 

xGA+c 

=  exp(7rf2)  •  exp(— 7r(x  —  fu,  x  —  tu)) 

xGA+c 

=  exp(7rt2)  •  p(A  +  c  —  tu). 


Both  claims  then  follow  by  Lemma  2.5.  □ 

Here  we  recall  a  standard  result  from  the  non-asymptotic  theory  of  random  matrices;  for  further  details, 
see  [Verl  1].  (The  proof  for  5-subgaussian  distributions  is  a  trivial  adaptation  of  the  O-subgaussian  case.) 

Lemma  2.9.  Let  X  £  Mnx  m  be  a  5-subgaussian  random  matrix  with  parameter  s.  There  exists  a  universal 
constant  C  >  0  such  that  for  any  t  >  0,  we  have  si(X)  <  C  ■  s  ■  (\fm  +  y/n  +  t)  except  with  probability  at 
most  2  exp(b')  exp(— nt2). 

Empirically,  for  discrete  Gaussians  the  universal  constant  C  in  the  above  lemma  is  very  close  to  1 
In  fact,  it  has  been  proved  that  C  <  I/s/2-k  for  matrices  with  independent  identically  distributed  continuous 
Gaussian  entries. 

4The  converse  also  holds  (up  to  a  small  constant  factor  in  the  parameter  s)  when  E[A']  =  0.  but  this  will  frequently  not  quite  be 
the  case  in  our  applications,  which  is  why  we  define  subgaussian  in  terms  of  the  moment-generating  function. 
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3  Search  to  Decision  Reduction 


Here  we  give  a  new  search-to-decision  reduction  for  LWE  that  essentially  subsumes  all  of  the  (incomparable) 
prior  ones  given  in  [BFKL93,  Reg05,  Pei09b,  ACPS09].5  Most  notably,  it  handles  moduli  q  that  were  not 
covered  before,  specifically,  those  like  q  =  2k  that  are  divisible  by  powers  of  very  small  primes.  The  only 
known  reduction  that  ours  does  not  subsume  is  a  different  style  of  sample-preserving  reduction  recently  given 
in  [MM  11],  which  works  for  a  more  limited  class  of  moduli  and  error  distributions;  extending  that  reduction 
to  the  full  range  of  parameters  considered  here  is  an  interesting  open  problem.  In  what  follows,  u(y/ log  n) 
denotes  some  fixed  function  that  grows  faster  than  ylogn,  asymptotically. 

Theorem  3.1.  Let  q  have  prime  factorization  q  =  pif  ■  ■  ■  jff  for  pair-wise  distinct  poly  (n) -bounded  primes  pi 
with  each  e*  >  1,  and  let  0  <  a  <  l/ui(\/\ogn).  Let  i  be  the  number  of  prime  factors  pi  <  uj(^/\ogn)  /  a. 
There  is  a  probabilistic  polynomial-time  reduction  from  solving  search-\SNEqa  (in  the  worst  case,  with 
over-whelming  probability)  to  solving  de c is i o n - LW Eqa>  (on  the  average,  with  non-negligible  advantage)  for 
any  a'  >  a  such  that  a'  >  w(^/log  n) / p ef  for  every  i,  and  (a'p  >  a  ■  cj(^/logn)1_K 

For  example,  when  every  pi  >  cu(\Aog  n)/a  we  have  l  =  0,  and  any  a'  >  a  is  acceptable.  (This  special 
case,  with  the  additional  constraint  that  every  a  =  1,  is  proved  in  [Pei09b].)  As  a  qualitatively  new  example, 
when  q  =  pe  is  a  prime  power  for  some  (possibly  small)  prime  p,  then  it  suffices  to  let  o'  >  o  •  u;(\/log  n)2. 
(A  similar  special  case  where  q  =  pe  for  sufficiently  large  p  and  o'  =  a  <c  I  /p  is  proved  in  [ACPS09].) 

Proof  We  show  how  to  recover  each  entry  of  s  modulo  a  large  enough  power  of  each  pt,  given  access  to  the 
distribution  ASiCl  for  some  s£Z"  and  to  an  oracle  O  solving  DLWE,;Q/.  For  the  parameters  in  the  theorem 
statement,  we  can  then  recover  the  remainder  of  s  in  polynomial  time  by  rounding  and  standard  Gaussian 
elimination. 

First,  observe  that  we  can  transform  As  a  into  Asar  simply  by  adding  (modulo  1)  an  independent  sample 
from  Z)  /  ;2_a2  to  the  second  component  of  each  (a,  b  =  (a,  s )/q  +  Da  mod  1)  £  ZJ  x  T  drawn  from  As^a. 

We  now  show  how  to  recover  each  entry  of  s  modulo  (powers  of)  any  prime  p  =  pi  dividing  q.  Let 
e  =  e,;,  and  for  j  =  0, 1, . . . ,  e  define  A]s  Q,  to  be  the  distribution  over  Zq  x  T  obtained  by  drawing 
(a,  b)  <—  AS)0t  and  outputting  (a,  b  +  r/p1  mod  1)  for  a  fresh  uniformly  random  r  <—  7Lq.  (Clearly,  this 
distribution  can  be  generated  efficiently  from  ,4so/.)  Note  that  when  a'  >  cc(\/log n) / y>7  >  r/f  ((l//>?)Z) 
for  some  e  =  negl(ra),  AJsa,  is  negligibly  far  from  U  =  U( Z”  x  T),  and  this  holds  at  least  for  j  =  e 
by  hypothesis.  Therefore,  by  a  hybrid  argument  there  exists  some  minimal  j  e  [e]  for  which  O  has  a 
non-negligible  advantage  in  distinguishing  between  AJs~]  and  A^  ,,  over  a  random  choice  of  s  and  all  other 
randomness  in  the  experiment.  (This  j  can  be  found  efficiently  by  measuring  the  behavior  of  O.)  Note  that 
when  pi  >  ic ( \/log  n) fa  >  lo(  \/log  n)/a' ,  the  minimal  j  must  be  1;  otherwise  it  may  be  larger,  but  there 
arc  at  most  i  of  these  by  hypothesis.  Now  by  a  standard  random  self-reduction  and  amplification  techniques 
(e.g.,  [Reg05,  Lemma  4.1]),  we  can  in  fact  assume  that  O  accepts  (respectively,  rejects)  with  over-whelming 
probability  given  ,4^!  (resp.,  /L  a,),  for  any  s  e  Z”. 

Given  access  to  AJs~]  and  O,  we  can  test  whether  si  =  0  mod  p  by  invoking  O  on  samples  from  Alf] 
that  have  been  transformed  as  follows  (all  of  what  follows  is  analogous  for  s 2,  ■■■ ,  sn):  take  each  sample 
(a,  b  =  (a,  s)/q  +  e  +  r/p>~ 1  mod  1)  <—  A^~^,  to 

(a'  =  a  —  r'  ■  (q/p^)  ■  ei  ,  h'  =  b  =  (a',  s)/q  +  e  +  (pr  +  r'sij/p7  mod  1)  (3.1) 

3We  say  “essentially  subsumes”  because  our  reduction  is  not  very  meaningful  when  q  is  itself  a  very  small  prime,  whereas  those 
of  [BFKL93.  Reg05]  are  meaningful.  This  is  only  because  our  reduction  deals  with  the  continuous  version  of  LWE.  If  we  discretize 
the  problem,  then  for  very  small  prime  q  our  reduction  specializes  to  those  of  [BFKL93,  Reg05]. 


15 


4.  Trapdoors  for  Lattices 


for  a  fresh  r'  7Lq  (where  ei  =  (1,  0, . . . ,  0)  £  Z”).  Observe  that  if  si  =  0  mod  p,  the  transformed 
samples  arc  also  drawn  from  Afjl,  otherwise  they  are  drawn  from  ,4/  ,  because  r'si  is  uniformly  random 
modulo  p.  Therefore,  O  tells  us  which  is  the  case. 

Using  the  above  test,  we  can  efficiently  recover  si  mod  p  by  ‘shifting’  s\  by  each  of  0, ...  ,p  —  1  mod  p 
using  the  standard  transformation  that  maps  As  ai  to  As+t  „/  for  any  desired  t  by  taking  (a,  b ) 

to  (a,  b  +  (a,  t )/q  mod  1).  (This  enumeration  step  is  where  we  use  the  fact  that  every  pt  is  poly(n)- 
bounded.)  Moreover,  we  can  iteratively  recover  si  mod  p1 , . . .  ,pe_:,+1  as  follows:  having  recovered 
si  mod  pl,  first  ‘shift’  AS>Q>  to  4s/a/  where  s\  =  0  mod  pl,  then  apply  a  similar  procedure  as  above  to 
recover  sj  mod  pl+ 1 :  specifically,  just  modify  the  transformation  in  (3.1)  to  let  a'  =  a  —  r'  ■  (q/p1+l)  ■  ei, 
so  that  b'  =  b  =  (a',  s )/q  +  e  +  ( pr  +  r/(s/1  /p')) /pP .  This  procedure  works  as  long  as  pi+l  divides  q,  so  we 
can  recover  si  mod  pe~i+1. 

Using  the  above  reductions  and  the  Chinese  remainder  theorem,  and  letting  jp  be  the  above  minimal  value 
of  j  for  p  =  pi  (of  which  at  most  l  of  these  arc  greater  than  1),  from  ASjCl  we  can  recover  s  modulo 


P  = 


n 


P 


e»  — O’*— 1)  _ 


q/Ylrf  1  >Q 


a 


u(^logn) 


>  q  ■  a  ■  u 


ogn) 


because  a'  <  w(\/log  nj/p^1  for  all  i  by  definition  of  j,  and  by  hypothesis  on  a'.  By  applying  the  ‘shift’ 
transformation  to  ,4s  o  we  can  assume  that  s  =  0  mod  P.  Now  every  (a,  s') /q  is  an  integer  multiple  of 
P/q  >  a  ■  Co!(\/log  n),  and  since  every  noise  term  e  v-  Da  has  magnitude  <  (a/2)  •  cu(\/log  n)  with 
overwhelming  probability,  we  can  round  the  second  component  of  every  (a,  b)  As^a  to  the  exact  value  of 
(a,  s )/q  mod  1.  From  these  we  can  solve  for  s  by  Gaussian  elimination,  and  we  arc  done.  □ 


4  Primitive  Lattices 

At  the  heart  of  our  new  trapdoor  generation  algorithm  (described  in  Section  5)  is  the  construction  of  a  very 
special  family  of  lattices  which  have  excellent  geometric  properties,  and  admit  very  fast  and  parallel  izablc 
decoding  algorithms.  The  lattices  are  defined  by  means  of  what  we  call  a  primitive  matrix.  We  say  that  a 
matrix  G  £  Z” xm  is  primitive  if  its  columns  generate  all  of  Z”,  i.e.,  G  •  Zm  =  Z/.6 

The  main  results  of  this  section  arc  summarized  in  the  following  theorem. 

Theorem  4.1.  For  any  integers  q  >  2,  n  >  1,  k  =  [log2  q]  and  m  =  nk,  there  is  a  primitive  matrix 
G  £  Z”xm  such  that 

•  The  lattice  AJ-(G)  has  a  known  basis  S  £  Zmxm  with  ||S||  <  s/5  and  ||S||  <  max{\/5,  s/k}. 
Moreover,  when  q  =  2k,  we  have  S  =  21  (so  ||S||  =  2)  and  ||S||  =  s/5. 

•  Both  G  and  S  require  little  storage.  In  particular,  they  are  sparse  (with  only  0(m)  nonzero  entries ) 
and  highly  structured. 

•  Inverting  5g(s>  e)  :=  S<G  +  c1  mod  q  can  be  performed  in  quasilinear  0(n  ■  logc  n)  time  for  any 
s  £  Z”  and  any  e  £  V\/2(q  •  B  1 ),  where  B  can  denote  either  S  or  S.  Moreover,  the  algorithm  is 
perfectly  parallelizable,  running  in  polylogarithmic  0(logcn)  time  using  n  processors.  When  q  =  2k, 
the  polylogarithmic  term  0(logc  n)  is  essentially  just  the  cost  ofk  additions  and  shifts  on  k-bit  integers. 

f’We  do  not  say  that  G  is  “full-rank,”  because  Z9  is  not  a  field  when  q  is  not  prime,  and  the  notion  of  rank  for  matrices  over  is 
not  well  defined. 
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•  Preimage  sampling  for  /g(x)  =  Gx  mod  q  with  Gaussian  parameter  s  >  ||S||  •  u{y/\ogn)  can 
be  performed  in  quasilinear  0{n  ■  logc?r)  time,  or  parallel  poly  logarithmic  0(logcro)  time  using  n 
processors.  When  q  =  2k,  the  polylogarithmic  term  is  essentially  just  the  cost  ofk  additions  and  shifts 
on  k-bit  integers,  plus  the  (offline)  generation  of  about  m  random  integers  drawn  from  Dj,jS. 

More  generally,  for  any  integer  b  >  2,  all  of  the  above  statements  hold  with  k  =  |4og&q],  ||S||  <  y/b2  +  1, 
and  ||S||  <  max{\/&2  +  1,  ( b  —  1  )y/k};  and  when  q  =  bk,  we  have  S  =  61  and  ||S||  =  Vb2  +  1. 


The  rest  of  this  section  is  dedicated  to  the  proof  of  Theorem  4.1.  In  the  process,  we  also  make  several 
important  observations  regarding  the  implementation  of  the  inversion  and  sampling  algorithms  associated 
with  G,  showing  that  our  algorithms  are  not  just  asymptotically  fast,  but  also  quite  practical. 

Let  q  >  2  be  an  integer  modulus  and  k  >  1  he  an  integer  dimension.  Our  construction  starts  with  a 
primitive  vector  g  €  T,k,  i.e.,  a  vector  such  that  gcd(pi, . . . ,  q)  =  1.  The  vector  g  defines  a  /c-dimensional 
lattice  AJ-(g*)  C  Zfc  having  determinant  |Zfc/A*L(g*)|  =  q ,  because  the  residue  classes  of  Zfc/A*L(g*)  are 
in  bijective  correspondence  with  the  possible  values  of  (g,  x)  mod  q  for  x  G  Zfc,  which  cover  all  of  Z9 
since  g  is  primitive.  Concrete  primitive  vectors  g  will  be  described  in  the  next  subsections.  Notice  that 
when  q  =  poly(n),  we  have  k  =  O(logq)  =  O(logn)  and  so  AJ-(gt)  is  a  very  low-dimensional  lattice.  Let 
Sfc  G  Zfcxfc  be  a  basis  of  A~L(g<),  that  is,  g*  •  Sfc  =  0  G  Z^xfe  and  |det(Sfc)|  =  q. 

The  primitive  vector  g  and  associated  basis  S&  are  used  to  define  the  parity-check  matrix  G  and  basis 
S  G  Z,  as  G  :=  In  ®  g*  G  Z£xnfc  and  S  :=  In  <g>  Sk  G  Znkxnk.  That  is, 


G  :  = 


1 

•to 

■ 

1 

?r 

_ I 

G  Z£xnfc,  S  :  = 

i 

CTQ 

i_! _ 

- 1 

cn 

_ 1 

G  Z' 


nkxnk 


Equivalently,  G,  A1-  (G),  and  S  are  the  direct  sums  of  n  copies  of  gf,  A1-  (g/ ),  and  S/,:,  respectively.  It  follows 
that  G  is  a  primitive  matrix,  the  lattice  A  (G)  c  Z"/,:  has  determinant  q",  and  S  is  a  basis  for  this  lattice.  It 
also  follows  (and  is  clear  by  inspection)  that  ||S||  =  ||Sjt||  and  ||S||  =  ||Sfc||. 

By  this  direct  sum  construction,  it  is  immediate  that  inverting  pG(s>e)  and  sampling  preimages  of 
/g(x)  can  be  accomplished  by  performing  the  same  operations  n  times  in  parallel  for  ggt  and  /gt  on  the 
coixesponding  portions  of  the  input,  and  concatenating  the  results.  For  preimage  sampling,  if  each  of  the  /gt 
preimages  has  Gaussian  parameter  y/T,,  then  by  independence,  their  concatenation  has  parameter  In  <g)  y/Y,. 
Likewise,  inverting  pc  will  succeed  whenever  all  the  n  independent  ggt  -inversion  subproblems  are  solved 
correctly. 

In  the  next  two  subsections  we  study  concrete  instantiations  of  the  primitive  vector  g,  and  give  optimized 
algorithms  for  inverting  ggi  and  sampling  preimages  for  /gt.  In  both  subsections,  we  consider  primitive 
lattices  A“L(gi)  C  7Lk  defined  by  the  vector 

g*:=[l  2  4  •••  2k~1]eZlxk,  k  =  \log2q],  (4.1) 


whose  entries  form  a  geometrically  increasing  sequence.  (We  focus  on  powers  of  2,  but  all  our  results 
trivially  extend  to  other  integer  powers,  or  even  mixed-integer  products.)  The  only  difference  between 
the  two  subsections  is  in  the  form  of  the  modulus  q.  We  first  study  the  case  when  the  modulus  q  =  2k 
is  a  power  of  2,  which  leads  to  especially  simple  and  fast  algorithms.  Then  we  discuss  how  the  results 
can  be  generalized  to  arbitrary  moduli  q.  Notice  that  in  both  cases,  the  syndrome  (g,  x)  G  Z9  of  a  binary 
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vector  x  =  (xo,  •  •  • ,  %k-i )  €  {0,  l}fc  is  just  the  positive  integer  with  binary  expansion  x.  In  general,  for 
arbitrary  x  £  Zfc  the  syndrome  (g,  x)  £  Zg  can  be  computed  very  efficiently  by  a  sequence  of  k  additions 
and  binary  shifts,  and  a  single  reduction  modulo  q,  which  is  also  trivial  when  q  =  21'  is  a  power  of  2.  The 
syndrome  computation  is  also  easily  parallelizable,  leading  to  0(log  k)  =  0(log  log  n)  computation  time 
using  0(k)  =  O(logn)  processors. 


4.1  Power-of-Two  Modulus 

Let  q  =  2k  be  a  power  of  2,  and  let  g  be  the  geometric  vector  defined  in  Equation  (4.1).  Define  the  matrix 


Sfc  :  = 


2 

-1  2 
-1 


€  Z 


kxk 


2 

-1  2 


This  is  a  basis  for  AJ-(gt),  because  g*  ■  S /,  =  0  mod  q  and  clef  ( S /,. )  =  2 k  =  q.  Clearly,  all  the  basis  vectors 
are  short.  Moreover,  by  orthogonalizing  S/,  in  reverse  order,  we  have  S/,:  =  2  ■  I/,.  This  construction  is 
summarized  in  the  following  proposition.  (It  generalizes  in  the  obvious  way  to  any  integer  base,  not  just  2.) 

Proposition  4.2.  For  q  =  2k  and  g  =  (1,  2, . . . ,  2fc_1)  £  Zk,  the  lattice  A±(g/ )  has  a  basis  S  such  that 
S  =  21  and  ||S||  <  y/h.  In  particular;  rje(A-L(gt))  <  2r  =  2  •  tu(\/log  n)  for  some  e(n)  =  negl(n). 


Using  Proposition  4.2  and  known  generic  algorithms  [Bab85,  KleOO,  GPV08],  it  is  possible  to  invert 
ggt  (s,  e)  correctly  whenever  e  £  'Pl/2((q/2)  ■  I),  and  sample  preimages  under  fgt  with  Gaussian  parameter 
s  >  2r  =  2  ■  u (y/log  n).  In  what  follows  we  show  how  the  special  structure  of  the  basis  S  leads  to  simpler, 
faster,  and  more  practical  solutions  to  these  general  lattice  problems. 


Inversion.  Here  we  show  how  to  efficiently  find  an  unknown  scalar  s  £  Zq  given  Ik  =  [bo,  bi, ,  bk- 1]  = 
s  ■  g*  +  e*  =  [s  +  eo,  2s  +  ei, . . . ,  2 k~1s  +  efc_i]  mod  q,  where  e  £  Zk  is  a  short  error  vector. 

An  iterative  algorithm  works  by  recovering  the  binary  digits  sq,  si,  •  • . ,  Sk- 1  £  {0. 1}  of  s  £  Zq,  from 
least  to  most  significant,  as  follows:  first,  determine  so  by  testing  whether 

bk- 1  =  2A:~1s  +  ek-i  =  (q/2)so  +  ek-\  mod  q 

is  closer  to  0  or  to  q/2  (modulo  q).  Then  recover  si  from  bk- 2  =  2 k~2s  +  ek- 2  =  2k~ls\  +  2fe_2so  + 
ek- 2  mod  q,  by  subtracting  2k~2so  and  testing  proximity  to  0  or  q/2,  etc.  It  is  easy  to  see  that  the  algorithm 
produces  correct  output  if  every  e*  £  [— |,  |),  i.e.,  if  e  £  Vi/2 (q  •  Ife/2)  =  Vi /2{q  •  (S^)- *).  It  can  also  be 
seen  that  this  algorithm  is  exactly  Bahai’s  ’'nearest-plane”  algorithm  [Bab85],  specialized  to  the  scaled  dual 
q(Sk)~t  of  the  basis  S k  of  A±(gt),  which  is  a  basis  for  A(g). 

Formally,  the  iterative  algorithm  is:  given  a  vector  b/  =  [60,  •  •  • ,  bk- 1]  £  Zkxk\  initialize  s  <—  0. 

1.  For  i  =  k  —  1, . . . ,  0:  let  s  <—  s  +  2k~l~l  •  [6*  —  2l  ■  s  0  [—  |)  mod  q\ ,  where  [E\  =  1  if  expression 

E  is  true,  and  0  otherwise.  Also  let  e*  •(—  bi  —  2*  •  s  £  [— |,  |). 

2.  Output  s  £  Zg  and  e  =  (e0, . . . ,  ek-\)  £  f  )k  C.  Zk. 
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Note  that  for  x  £  {0, . . .  ,q  —  1}  with  binary  representation  (xk-\Xk~2  ■  ■  ■  £0)2,  we  have 

[x  £  [~%  §)  mod  g]  =  xfc_i  ©  xfc_2. 

There  is  also  a  non-iterative  approach  to  decoding  using  a  lookup  table,  and  a  hybrid  approach  between 
the  two  extremes.  Notice  that  rounding  each  entry  b,  of  b  to  the  nearest  multiple  of  2l  (modulo  q,  breaking 
ties  upward)  before  running  the  above  algorithm  does  not  change  the  value  of  s  that  is  computed.  This  lets 
us  precompute  a  lookup  table  that  maps  the  2fc(fc+1)/2  =  q()(]"ri)  possible  rounded  values  of  b  to  the  correct 
values  of  s.  The  size  of  this  table  grows  very  rapidly  for  k  >  3,  but  in  this  case  we  can  do  better  if  we  assume 
slightly  smaller  error  terms  ei  G  [-§<  I)  :  simply  round  each  h,  to  the  nearest  multiple  of  max{|,  2*},  thus 
producing  one  of  exactly  8fe_1  =  q3/ 8  possible  results,  whose  solutions  can  be  stored  in  a  lookup  table.  Note 
that  the  result  is  correct,  because  in  each  coordinate  the  total  error  introduced  by  e%  and  rounding  to  a  multiple 
of  |  is  in  the  range  [—  |).  A  hybrid  approach  combining  the  iterative  algorithm  with  table  lookups  of  £ 

bits  of  s  at  a  time  is  potentially  the  most  efficient  option  in  practice,  and  is  easy  to  devise  from  the  above 
discussion. 

Gaussian  sampling.  We  now  consider  the  preimage  sampling  problem  for  function  /gt,  i.e.,  the  task  of 
Gaussian  sampling  over  a  desired  coset  of  A±(gt).  More  specifically,  we  want  to  sample  a  vector  from  the 
set  A^(g*)  =  {x  £  :  (g,  x)  =  a  mod  q\  for  a  desired  syndrome  u  £  Z9,  with  probability  proportional 

to  ps(x).  We  wish  to  do  so  for  any  fixed  Gaussian  parameter  s  >  ||Sfc||  •  r  =  2  •  uj (ydog  n),  which  is  an 
optimal  bound  on  the  smoothing  parameter  of  A  L(G). 

As  with  inversion,  there  are  two  main  approaches  to  Gaussian  sampling,  which  are  actually  opposite 
extremes  on  a  spectrum  of  storage/parallelism  trade-offs.  The  first  approach  is  essentially  to  precompute 
and  store  many  independent  samples  x  £-  DZk  s,  ‘bucketing’  them  based  on  the  value  of  (g,  x)  £  Zg  until 
there  is  at  least  one  sample  per  bucket.  Because  each  (g,  x)  is  statistically  close  to  uniform  over  7Lq  (by  the 
smoothing  parameter  bound  for  A^(gf)),  a  coupon-collecting  argument  implies  that  we  need  to  generate 
about  q  log  q  samples  to  occupy  every  bucket.  The  online  part  of  the  sampling  algorithm  for  AJ-(gt)  is  trivial, 
merely  taking  a  fresh  x  from  the  appropriate  bucket.  The  downside  is  that  the  storage  and  precomputation 
requirements  arc  rather  high:  in  many  applications,  q  (while  polynomial  in  the  security  parameter)  can  be  in 
the  many  thousands  or  more. 

The  second  approach  exploits  the  niceness  of  the  orthogonalized  basis  S/.  =  21/,..  Using  this  basis,  the 
randomized  nearest-plane  algorithm  of  [KleOO,  GPV08]  becomes  very  simple  and  efficient,  and  is  equivalent 
to  the  following:  given  a  syndrome  u  £  {0,  —  1}  (viewed  as  an  integer), 

1.  For  i  =  0, . . . ,  k  —  1:  choose  xt  £-  -D2z+UjS  and  let  u  £-  (u  —  Xi)/ 2  £  Z. 

2.  Output  x  =  (.t0,  •  •  • ,  xk-i). 

Observe  that  every  Gaussian  xt  in  the  above  algorithm  is  chosen  from  one  of  only  two  possible  cosets  of  2Z, 
determined  by  the  least  significant  bit  of  u  at  that  moment.  Therefore,  we  may  precompute  and  store  several 
independent  Gaussian  samples  from  each  of  2Z  and  2Z  + 1,  and  consume  one  per  iteration  when  executing  the 
algorithm.  (As  above,  the  individual  samples  may  be  generated  by  choosing  several  x  •£-  Dz.s  and  bucketing 
each  one  according  to  its  least-significant  bit.)  Such  presampling  makes  the  algorithm  deterministic  during 
its  online  phase,  and  because  there  arc  only  two  cosets,  there  is  almost  no  wasted  storage  or  precomputation. 
Notice,  however,  that  this  algorithm  requires  k  =  lg(g)  sequential  iterations. 

Between  the  extremes  of  the  two  algorithms  described  above,  there  is  a  hybrid  algorithm  that  chooses 
i  >  1  entries  of  x  at  a  time.  (For  simplicity,  we  assume  that  l  divides  k  exactly,  though  this  is  not 
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strictly  necessary.)  Let  IT  =  [1,2,...,  2£  ']  G  Z2E  be  a  parity-check  matrix  defining  the  2^-ary  lattice 
A±(ht)  C  Z£,  and  observe  that  g*  =  [h*,  2e  •  h*, . . . ,  2k~e  •  h*].  The  hybrid  algorithm  then  works  as  follows: 

1.  For  i  =  0, . . . ,  k/l  —  1,  choose  (xn, . . . ,  t—  DA±  (ht\  and  let  u  t—  (u  —  x)/2£,  where 

X  =  0  xU+j  •  G  z. 

2.  Output  X  =  (.To,  •  •  •  ,  Xk-l). 

As  above,  we  can  precompute  samples  x  I)7j  s  and  store  them  in  a  lookup  table  having  2f  buckets, 
indexed  by  the  value  (h,  x)  G  Z2c,  thereby  making  the  algorithm  deterministic  in  its  online  phase. 


4.2  Arbitrary  Modulus 


For  a  modulus  q  that  is  not  a  power  of  2,  most  of  the  above  ideas  still  work,  with  slight  adaptations.  Let 
k  =  [lg(g)~| ,  so  q  <  2k.  As  above,  define  g4  :=  [1,2,...,  2fc_1]  G  Z^xfc,  but  now  define  the  matrix 


Qo 

qi 

<?2 


G  Z 


kxk 


2  qk-2 
-1  qk-i 


where  (qo, . . . ,  qk-i)  G  {0, 1}A:  is  the  binary  expansion  of  q  =  2*  •  q^.  Again,  S  is  a  basis  of  A^(gf) 

because  g;  •  S/,:  =  0  mod  q,  and  det(S/c)  =  q.  Moreover,  the  basis  vectors  have  squared  length  | s,  1 1 2  =  5 
for  i  <  k  and  |s/,.||2  =  q,  <  k.  The  next  lemma  shows  that  S also  has  a  good  Gram-Schmidt 
orthogonalization. 

Lemma  4.3.  With  S  =  S  defined  as  above  and  orthogonalized  in  forward  order,  we  have  |js)  ||2  =  G 

(4,  5]  for  1  <  i  <  k,  and  ||sfc||2  =  -grzi  <  3. 

Proof  Notice  that  the  the  vectors  si, . . . ,  s/,._  |  are  all  orthogonal  to  gy  =  (1,  2, 4, ... ,  2fc_1)  G  Zfc.  Thus, 
the  orthogonal  component  of  s/,.  has  squared  length 


|s~fe||2=  (Sfc,S^ 


3  q2 


.44  4fc  —  1 ' 


il§fclh 

Similarly,  the  squared  length  of  s \  for  %  <  h:  can  be  computed  as 

„~„2  ,  4*  4-4“* 

Si  2  =  1  + 


E,<^'  1-4- 


□ 


This  concludes  the  description  and  analysis  of  the  primitive  lattice  A±(g/)  when  q  is  not  a  power 
of  2.  Specialized  inversion  algorithms  can  also  be  adapted  as  well,  but  some  care  is  needed.  Of  course, 
since  the  lattice  dimension  k  =  O(logn)  is  very  small,  one  could  simply  use  the  general  methods  of 
[Bab85,  KleOO,  GPV08,  PeilO]  without  worrying  too  much  about  optimizations,  and  satisfy  all  the  claims 
made  in  Theorem  4.1.  Below  we  briefly  discuss  alternatives  for  Gaussian  sampling. 
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The  offline  ‘bucketing’  approach  to  Gaussian  sampling  works  without  any  modification  for  arbitrary 
modulus,  with  just  slighly  larger  Gaussian  parameter  s  >  \/5  •  r,  because  it  relies  only  on  the  smoothing 
parameter  bound  of  r/e(A  (g,j)  <  j  S /,.  |  •  tu(\/log  n)  and  the  fact  that  the  number  of  buckets  is  q.  The 
randomized  nearest-plane  approach  to  sampling  does  not  admit  a  specialization  as  simple  as  the  one  we  have 
described  for  q  =  2k.  The  reason  is  that  while  the  basis  S  is  sparse,  its  orthogonalization  S  is  not  sparse  in 
general.  (This  is  in  contrast  to  the  case  when  q  =  2k,  for  which  orthogonalizing  in  reverse  order  leads  to 
the  sparse  matrix  S  =  21.)  Still,  S  is  “almost  triangular,”  in  the  sense  that  the  off-diagonal  entries  decrease 
geometrically  as  one  moves  away  from  the  diagonal.  This  may  allow  for  optimizing  the  sampling  algorithm 
by  performig  “truncated”  scalar  product  computations,  and  still  obtain  an  almost-Gaussian  distribution  on  the 
resulting  samples.  An  interesting  alternative  is  to  use  a  hybrid  approach,  where  one  first  performs  a  single 
iteration  of  randomized  nearest-plane  algorithm  to  take  care  of  the  last  basis  vector  s &,  and  then  performs 
some  variant  of  the  convolution  algorithm  from  [Pei  10]  to  deal  with  the  first  k  —  1  basis  vectors  [si, . . . ,  s/._  ]  ], 
which  have  very  small  lengths  and  singular  values.  Notice  that  the  orthogonalized  component  of  the  last 
vector  s/,.  is  simply  a  scalar  multiple  of  the  primitive  vector  g,  so  the  scalar  product  (s/,.,  t)  (for  any  vector  t 
with  syndrome  u  =  (g,  t))  can  be  immediately  computed  from  u  as  u/q  (see  Lemma  4.3). 

4.3  The  Ring  Setting 

The  above  constructions  and  algorithms  all  transfer  easily  to  compact  lattices  defined  over  polynomial  rings 
(i.e.,  number  rings),  as  used  in  the  representative  works  [Mic02,  PR06,  LM06,  LPR10].  A  commonly  used 
example  is  the  cyclomotic  ring  R  =  Z[.x]/(<Pm(x))  where  <f>m(x)  denotes  the  mth  cyclotomic  polynomial, 
which  is  a  monic,  degree- p(m),  irreducible  polynomial  whose  zeros  arc  all  the  primitive  mth  roots  of  unity 
in  C.  The  ring  R  is  a  Z-module  of  rank  n,  i.e.,  it  is  generated  as  the  additive  integer  combinations  of  the 
“power  basis”  elements  1  i  X  ^  X  j  •  •  •  j  X  <fkin>  1 .  We  let  Rq  =  R/qR,  the  ring  modulo  the  ideal  generated  by  an 
integer  q.  For  geometric  concepts  like  error  vectors  and  Gaussian  distributions,  it  is  usually  nicest  to  work 
with  the  “canonical  embedding”  of  R,  which  roughly  (but  not  exactly)  corresponds  with  the  “coefficient 
embedding,”  which  just  considers  the  vector  of  coefficients  relative  to  the  power  basis. 

Let  g  e  Rk  be  a  primitive  vector  modulo  q,  i.e.,  one  for  which  the  ideal  generated  by  q,  gi, . . . ,  is  the 
full  ring  R.  As  above,  the  vector  g  defines  functions  fgt :  Rk  — >  Rq  and  ggt :  Rq  x  Rk  — >  Rqxk,  defined  as 
/gt  (x)  =  (g,  x)  =  J2i=i  9i '  xi  m°d  q  and  ggt  (s,  e)  =  s  ■  g*  +  e4  mod  q,  and  the  related  f?-module 

qRk  C  A±(g*)  :=  {x  G  Rk  :  fgt(x)  =  (g,  x)  =  0  mod  q}  C  Rk, 

which  has  index  (determinant)  qn  =  R.q  as  an  additive  subgroup  of  Rk  because  g  is  primitive.  Concretely, 
we  can  use  the  exact  same  primitive  vector  g*  =  [1,2,...,  2;‘  '[  G  Rk  as  in  Equation  (4.1),  interpreting  its 
entries  in  the  ring  Rq  rather  than  Z?. 

Inversion  and  preimage  sampling  algorithms  for  ggt  and  /gt  (respectively)  are  relatively  straightforward 
to  obtain,  by  adapting  the  basic  approaches  from  the  previous  subsections.  These  algorithms  are  simplest 
when  the  power  basis  elements  1,  x,  x2 , . . . ,  a;Am>  1  arc  orthogonal  under  the  canonical  embedding  (which 
is  the  case  exactly  when  m  is  a  power  of  2,  and  hence  <hm(x)  =  xm,/~  +  1),  because  the  inversion  operations 
reduce  to  parallel  operations  relative  to  each  of  the  power  basis  elements.  We  defer  the  details  to  the  full 
version. 
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5  Trapdoor  Generation  and  Operations 


In  this  section  we  describe  our  new  trapdoor  generation,  inversion  and  sampling  algorithms  for  hard  random 
lattices.  Recall  that  these  are  lattices  A-1  (A)  defined  by  an  (almost)  uniformly  random  matrix  A  E  Z”xm, 
and  that  the  standard  notion  of  a  “strong”  trapdoor  for  these  lattices  (put  forward  in  [GPV08]  and  used 
in  a  large  number  of  subsequent  applications)  is  a  short  lattice  basis  S  £  Zmxm  for  A-1  (A).  There  are 
several  measures  of  quality  for  the  trapdoor  S,  the  most  common  ones  being  (in  nondecreasing  order): 
the  maximal  Gram-Schmidt  length  ||S||;  the  maximal  Euclidean  length  ||S||;  and  the  maximal  singular 
value  si(S).  Algorithms  for  generating  random  lattices  together  with  high-quality  trapdoor  bases  are  given 
in  [Ajt99,  AP09].  In  this  section  we  give  much  simpler,  faster  and  tighter  algorithms  to  generate  a  hard 
random  lattice  with  a  trapdoor,  and  to  use  a  trapdoor  for  performing  standard  tasks  like  inverting  the  LWE 
function  //a  and  sampling  preimages  for  the  SIS  function  /a-  We  also  give  a  new,  simple  algorithm  for 
delegating  a  trapdoor,  i.e.,  using  a  trapdoor  for  A  to  obtain  one  for  a  matrix  [A  |  A']  that  extends  A,  in  a 
secure  and  non-re versible  way. 

The  following  theorem  summarizes  the  main  results  of  this  section.  Here  we  state  just  one  typical 
instantiation  with  only  asymptotic  bounds.  More  general  results  and  exact  bounds  are  presented  throughout 
the  section. 

Theorem  5.1.  There  is  an  efficient  randomized  algorithm  GenTrap(ln,  lm,  q)  that,  given  any  integers  n>  1, 
q  >  2,  and  sufficiently  large  m  =  0(n  log  q),  outputs  a  parity-check  matrix  A  £  Z” xm  and  a  ‘trapdoor’  R 
such  that  the  distribution  of  A  is  negl(n)  -far  from  uniform.  Moreover,  there  are  efficient  algorithms  Invert 
and  SampleD  that  with  overwhelming  probability  over  all  random  choices,  do  the  following: 

•  For  b4  =  slA  +  ef',  where  s  £  Z™  is  arbitrary  and  either  ||e||  <  q/0(y/n  log#)  ore  f-  Dzm,aqfor 
1/a  >  y/n  log  q  ■  ui{y/\ogn),  the  deterministic  algorithm  lnvert(R,  A,  b)  outputs  s  and  e. 

•  For  any  u  £  Z ”  and  large  enough  s  =  0(y/n  log  q),  the  randomized  algorithm  SampleD(R,  A,  u,  s ) 
samples  from  a  distribution  within  negl(n)  statistical  distance  o/T)A-L(A),s-a;(%/Iogn)- 

Throughout  this  section,  we  let  G  £  Z”x":  denote  some  fixed  primitive  matrix  that  admits  efficient 
inversion  and  preimage  sampling  algorithms,  as  described  in  Theorem  4.1.  (Recall  that  typically,  w  = 
n[~log  q]  for  some  appropriate  base  of  the  logarithm.)  All  our  algorithms  and  efficiency  improvements  are 
based  on  the  primitive  matrix  G  and  associated  algorithms  described  in  Section  4,  and  a  new  notion  of 
trapdoor  that  we  define  next. 

5.1  A  New  Trapdoor  Notion 

We  begin  by  defining  the  new  notion  of  trapdoor,  establish  some  of  its  most  important  properties,  and  give  a 
simple  and  efficient  algorithm  for  generating  hard  random  lattices  together  with  high-quality  trapdoors. 

Definition  5.2.  Let  A  £  Z”xr"  and  G  £  Z”xu;  be  matrices  with  m  >  w  >  n.  A  G -trapdoor  for  A  is  a 
matrix  R  £  jf  m-w)y,w  sucp  that  A  [  ^]  =  HG  for  some  invertible  matrix  H  £  Z”x”.  We  refer  to  H  as  the 
tag  or  label  of  the  trapdoor.  The  quality  of  the  trapdoor  is  measured  by  its  largest  singular  value  .si  (R). 

We  remark  that,  by  definition  of  G-trapdoor,  if  G  is  a  primitive  matrix  and  A  admits  a  G  trapdoor,  then 
A  is  primitive  as  well.  In  particular,  det(AJ-(A))  =  qn.  Since  the  primitive  matrix  G  is  typically  fixed  and 
public,  we  usually  omit  references  to  it,  and  refer  to  G-trapdoors  simply  as  trapdoors.  We  remark  that  since 
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G  is  primitive,  the  tag  H  in  the  above  definition  is  uniquely  determined  by  (and  efficiently  computable  from) 
A  and  the  trapdoor  R. 

The  following  lemma  says  that  a  good  basis  for  A-1  (A)  may  be  obtained  from  knowledge  of  R.  We 
do  not  use  the  lemma  anywhere  in  the  rest  of  the  paper,  but  include  it  here  primarily  to  show  that  our  new 
definition  of  trapdoor  is  at  least  as  powerful  as  the  traditional  one  of  a  short  basis.  Our  algorithms  for  Gaussian 
sampling  and  LWE  inversion  do  not  need  a  full  basis,  and  make  direct  (and  more  efficient)  use  of  our  new 
notion  of  trapdoor. 

Lemma  5.3.  Let  S  €  rLwxw  be  any  basis  for  AJ-(G).  Let  A  €  Z”xm  have  trapdoor  R  £  yj{m-w)xw  vv/;/; 
tag  H  £  Z"xn.  Then  the  lattice  A_L(A)  is  generated  by  the  basis 


I  R 

'  I  O' 

0  I 

w  s 

where  W  £  Z"' x m  is  an  arbitrary  solution  to  GW  =  — H  1A[I  |  0| 7  mod  q.  Moreover,  the  basis  Sa 
satisfies  ||Sa||  <  si(  [  J  ^] )  •  ||S||  <  (si(R)  +  1)  •  ||S||,  when  Sa  is  ortho gonalized  in  suitable  order. 

Proof.  It  is  immediate  to  check  that  A  •  Sa  =  0  mod  q,  so  Sa  generates  a  sublattice  of  A  (A).  In  fact,  it 
generates  the  entire  lattice  because  det(SA)  =  det(S)  =  qn  =  det(AJ-(A)). 

The  bound  on  1 1 S a  1 1  follows  by  simple  linear  algebra.  Recall  by  Item  3  of  Lemma  2. 1  that  | j B 1 1  =  1 1 S 1 1 
when  the  columns  of  B  =  [w  s]  are  reordered  appropriately.  So  it  suffices  to  show  that  ||TB||  < 
si(T)  •  || B  ||  for  any  T,  B.  Let  B  =  QDU  and  TB  =  Q'D'U'  be  Gram-Schmidt  decompositions  of  B 
and  TB,  respectively,  with  Q,  Q'  orthogonal,  D,  D'  diagonal  with  nonnegative  entries,  and  U,  U'  upper 
unitriangular.  We  have 

TQDU  =  Q'D'U'  =>  T'D  =  D'U", 

where  T  =  Q'T'Q  1  =£  s\  (T'  )  =  si(T),  and  U"  is  upper  unitriangular  because  such  matrices  form  a 
multiplicative  group.  Now  every  row  of  T'D  has  Euclidean  norm  at  most  si(T)  •  ||D||  =  si(T)  •  ||B||, 
while  the  ith  row  of  D'U"  has  norm  at  least  d[  t,  the  ith  diagonal  of  D'.  We  conclude  that  ||TB||  =  ||D||  < 
si(T)  •  || B  || ,  as  desired.  □ 

We  also  make  the  following  simple  but  useful  observations: 

•  The  rows  of  [  ^  ]  in  Definition  5.2  can  appear  in  any  order,  since  this  just  induces  a  permutation  of  A’s 
columns. 

•  If  R  is  a  trapdoor  for  A,  then  it  can  be  made  into  an  equally  good  trapdoor  for  any  extension  [A  |  B], 
by  padding  R  with  zero  rows;  this  leaves  si(R)  unchanged. 

•  If  R  is  a  trapdoor  for  A  with  tag  H,  then  R  is  also  a  trapdoor  for  A'  =  A  —  [0  |  H'G]  with  tag 
(H  —  H')  for  any  H'  £  Z”xn,  as  long  as  (H  —  H')  is  invertible  modulo  q.  This  is  the  main  idea 
behind  the  compact  IBE  of  [ABB  10a],  and  can  be  used  to  give  a  family  of  “tag-based”  trapdoor 
functions  [KMO10].  In  Section  6  we  give  explicit  families  of  matrices  H  having  suitable  properties 
for  applications. 

5.2  Trapdoor  Generation 

We  now  give  an  algorithm  to  generate  a  (pseudo)random  matrix  A  together  with  a  G-trapdoor.  The  algorithm 
is  straightforward,  and  in  fact  it  can  be  easily  derived  from  the  definition  of  G-trapdoor  itself.  A  random 
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lattice  is  built  by  first  extending  the  primitive  matrix  G  into  a  semi-random  matrix  A'  =  [A  |  HG] 
(where  A  6  Z”xm  is  chosen  at  random,  and  H  G  Z”xn  is  the  desired  tag),  and  then  applying  a  random 
transformation  T  =  [  J  ^  ]  G  Zmxr"  to  the  semi-random  lattice  A  ( Ar ) .  Since  T  is  unimodular  with  inverse 
T  1  =  [  J  ] ,  by  Lemma  2.1  this  yields  the  lattice  T  •  A-1  (A')  =  A-  (A'  •  T  ~ 1 )  associated  with  the 
parity-check  matrix  A  =  A'  •  T  1  =  [A  |  HG  —  AR].  Moreover,  the  distribution  of  A  is  close  to  uniform 
(either  statistically,  or  computationally)  as  long  as  the  distribution  of  [A  |  0]  T  1  =  [A  |  —  AR]  is.  For 
details,  see  Algorithm  1,  whose  correctness  is  immediate. 

Algorithm  1  Efficient  algorithm  GenTrap‘n(A.  H)  for  generating  a  parity-check  matrix  A  with  trapdoor  R. 
Input:  Matrix  A  G  Z”xm  for  some  fh  >  1,  invertible  matrix  H  6  Z”xn,  and  distribution  V  over  Zmx"\ 
(If  no  particular  A,  H  are  given  as  input,  then  the  algorithm  may  choose  them  itself,  e.g.,  picking 
A  G  Z”,xm  uniformly  at  random,  and  setting  H  =  I.) 

Output:  A  parity-check  matrix  A  =  [A  |  Ai]  G  Z”xm,  where  m  =  fh  +  w,  and  trapdoor  R  with  tag  H. 

1:  Choose  a  matrix  R  G  Zmxu;  from  distribution  V. 

2:  Output  A  =  [A  |  HG  -  AR]  €  Z£xm  and  trapdoor  R  G  Z™x™. 


We  next  describe  two  types  of  GenTrap  instantiations.  The  first  type  generates  a  trapdoor  R  for  a 
statistically  near-uniform  output  matrix  A  using  dimension  fh  ~  n  log  q  or  less  (there  is  a  trade-off  between 
fh  and  the  trapdoor  quality  si(R)).  The  second  types  generates  a  computationally  pseudorandom  A  (under 
the  LWE  assumption)  using  dimension  fh  =  2 n;  this  pseudorandom  construction  is  the  first  of  its  kind  in  the 
literature.  Certain  applications  allow  for  an  optimization  that  decreases  fh  by  an  additive  n  term;  this  is  most 
significant  in  the  computationally  secure  construction  because  it  yields  fh  =  n. 

Statistical  instantiation.  This  instantiation  works  for  any  parameter  fh  and  distribution  V  over  Zmxu; 
having  the  following  two  properties: 

1.  Subgaussianity :  V  is  subgaussian  with  some  parameter  s  >  0  (or  <5-subgaussian  for  some  small  5). 
This  implies  by  Lemma  2.9  that  R  G-  V  has  -si(R)  =  s  ■  ()( \/Wi,  +  y/w),  except  with  probability 
2~ f Xm+w)'  (pcca||  that  the  constant  factor  hidden  in  the  O(-)  expression  is  ~  l/y/2n.) 

2.  Regularity,  for  A  <—  Z”xm  and  R  <—  V,  A  =  [A  |  AR]  is  5-uniform  for  some  5  =  negl(n). 

In  fact,  there  is  no  loss  in  security  if  A  contains  an  identity  matrix  I  as  a  submatrix  and  is  otherwise 
uniform,  since  this  corresponds  with  the  Hermite  normal  form  of  the  SIS  and  LWE  problems.  See, 
e.g.,  [MR09,  Section  5]  for  further  details. 

For  example,  let  V  =  'prnxw  where  V  is  the  distribution  over  Z  that  outputs  0  with  probability  1/2,  and  ±1 
each  with  probability  1/4.  Then  V  (and  hence  V)  is  O-subgaussian  with  parameter  y/2tt,  and  satisfies  the 
regularity  condition  (for  any  q )  for  S  <  ^y/qn / 2m,  by  a  version  of  the  leftover  hash  lemma  (see,  e.g.,  [AP09, 
Section  2.2.1]).  Therefore,  we  can  use  any  fh  >  n  lg  q  +  2  lg 

As  another  important  example,  let  V  =  be  a  discrete  Gaussian  distribution  for  some  s  >  Ve(%) 

and  e  =  ncgl(n).  Then  V  is  O-subgaussian  with  parameter  s  by  Lemma  2.8,  and  satisfies  the  regularity 
condition  when  fh  satisfies  the  bound  (2.2)  from  Lemma  2.4.  For  example,  letting  s  =  2r/f  (Z)  we  can  use 
any  fh  =  nlgq  +  ccflog  n).  (Other  tradeoffs  between  s  and  fh  are  possible,  potentially  using  a  different 
choice  of  G,  and  more  exact  bounds  on  the  error  probabilities  can  be  worked  out  from  the  lemma  statements.) 
Moreover,  by  Lemmas  2.4  and  2.8  we  have  that  with  overwhelming  probability  over  the  choice  of  A,  the 
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conditional  distribution  of  R  given  A  =  [A  |  AR]  is  negl(n)-subgaussian  with  parameter  s.  We  will  use 
this  fact  in  some  of  our  applications  in  Section  6. 

Computational  instantiation.  Let  A  =  [I  |  A]  G  Z/Xm  for  m  =  2 n,  and  let  V  =  D™xw  for  some 
s  =  aq,  where  a  >  0  is  an  LWE  relative  error  rate  (and  typically  aq  >  s/n).  Clearly,  V  is  O-subgaussian 
with  parameter  aq.  Also,  [A  |  AR  =  AR2  +  Ri]  for  R  = 

LWEn?.Q,  (in  its  normal  form),  and  hence  is  pseudorandom  (ignoring  the  identity  submatrix)  assuming  that 
the  problem  is  hard. 

Further  optimizations.  If  an  application  only  uses  a  single  tag  H  =  I  (as  is  the  case  with,  for  example, 
GPV  signatures  [GPV08]),  then  we  can  save  an  additive  n  term  in  the  dimension  m  (and  hence  in  the  total 
dimension  m):  instead  of  putting  an  identity  submatrix  in  A,  we  can  instead  use  the  identity  submatrix  from 
G  (which  exists  without  loss  of  generality,  since  G  is  primitive)  and  conceal  the  remainder  of  G  using  either 
of  the  above  methods. 

All  of  the  above  ideas  also  translate  immediately  to  the  ring  setting  (see  Section  4.3),  using  an  appropriate 
regularity  lemma  (e.g.,  the  one  in  [LPR10])  for  a  statistical  instantiation,  and  the  ring-LWE  problem  for  a 
computationally  secure  instantiation. 

5.3  LWE  Inversion 

Algorithm  2  below  shows  how  to  use  a  trapdoor  to  solve  LWE  relative  to  A.  Given  a  trapdoor  R  for 
A  G  Z” xm  and  an  LWE  instance  b*  =  sf  A  +  e1  mod  q  for  some  short  error  vector  e  G  Zm,  the  algorithm 
recovers  s  (and  e).  This  naturally  yields  an  inversion  algorithm  for  the  injective  trapdoor  function  //a(s.  e)  = 
s(A  +  e/  mod  q,  which  is  hard  to  invert  (and  whose  output  is  pseudorandom)  if  LWE  is  hard. 


Ri 

R2 


V  is  exactly  an  instance  of  decision- 


Algorithm  2  Efficient  algorithm  Invert0  (R,  A,  b)  for  inverting  the  function  ^a(s>  e). 

Input:  An  oracle  O  for  inverting  the  function  (jg(s.  e)  when  e  G  Z":  is  suitably  small. 

•  parity-check  matrix  A  G  Z”xm; 

•  G -trapdoor  R  G  Zmxfcn  for  A  with  invertible  tag  H  G  Z”xn; 

•  vector  If  =  gA (s,  e)  =  s*A  +  e1  for  any  s  G  Z”  and  suitably  small  e  G  Zm. 
Output:  The  vectors  s  and  e. 

1:  Compute  b*  =  b4  [  ^  ] . 

2:  Get  (s,  e)  G-  0( b). 

3:  return  s  =  H  ls  and  e  =  b  —  A's  (interpreted  as  a  vector  in  Zm  with  entries  in  [—  |)). 


Theorem  5.4.  Suppose  that  oracle  O  in  Algorithm  2  correctly  inverts  <?g(s,  e)  for  any  error  vector  e  G 
^1/2(9  '  B  1  )  for  some  B.  Then  for  any  s  and  e  of  length  ||e||  <  g/(2||B||s)  where  s  =  y/s i(R)2  +  1, 
Algorithm  2  correctly  inverts  gA{s,e).  Moreover,  for  any  s  and  random  e  <—  Dim  aq  where  1/a  > 
2 1  j  B  ||  s  ■  w(\/log  n),  the  algorithm  inverts  successfully  with  overwhelming  probability  over  the  choice  ofe. 

Note  that  using  our  constructions  from  Section  4,  we  can  implement  O  so  that  either  ||B||  =  2  (for  q  a 
power  of  2,  where  B  =  S  =  21)  or  ||B||  =  (for  arbitrary  q). 
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Proof.  Let  R  =  [r*  i],  and  note  that  s  =  si(R).  By  the  above  description,  the  algorithm  works  correctly 
when  Re  £  •  B~*);  equivalently,  when  (b^Rje/g  £  [—  5)  for  all  i.  By  definition  of  s,  we  have 

||b*R||  <  s 1 1 B 1 1 .  If  || e ||  <  <//(2||B||s),  then  |(b-R)e/g|  <  1/2  by  Cauchy-Schwarz.  Moreover,  if  e  is 
chosen  at  random  from  then  by  the  fact  that  e  is  O-subgaussian  (Lemma  2.8)  with  parameter  aq,  the 

probability  that  |(b*R)e/g|  >  1/2  is  negligible,  and  the  second  claim  follows  by  the  union  bound.  □ 

5.4  Gaussian  Sampling 

Here  we  show  how  to  use  a  trapdoor  for  efficient  Gaussian  preimage  sampling  for  the  function  /a,  i.e., 
sampling  from  a  discrete  Gaussian  over  a  desired  coset  of  A-1  (A).  Our  precise  goal  is,  given  a  G-trapdoor  R 
(with  tag  H)  for  matrix  A  and  a  syndrome  u  £  Z”,  to  sample  from  the  spherical  discrete  Gaussian  D,\_  (A)^S 
for  relatively  small  parameter  s.  As  we  show  next,  this  task  can  be  reduced,  via  some  efficient  pre-  and 
post-processing,  to  sampling  from  any  sufficiently  narrow  (not  necessarily  spherical)  Gaussian  over  the 
primitive  lattice  A~L(G). 

The  main  ideas  behind  our  algorithm,  which  is  described  formally  in  Algorithm  3,  are  as  follows.  For 
simplicity,  suppose  that  R  has  tag  H  =  I,  so  A  [ ^]  =  G,  and  suppose  we  have  a  subroutine  for  Gaussian 
sampling  from  any  desired  coset  of  A~L(G)  with  some  small,  fixed  parameter  /Sg  >  ry(A  ( G ) ) .  For 
example,  Section  4  describes  algorithms  for  which  \/Yff  is  either  2  or  y/5.  (Throughout  this  summary  we 
omit  the  small  rounding  factor  r  =  cu(\/log  n )  from  all  Gaussian  parameters.)  The  algorithm  for  sampling 
from  a  coset  A„  (A)  follows  from  two  main  observations: 

1.  If  we  sample  a  Gaussian  z  with  parameter  \/Sg  from  Au  (G)  and  produce  y  =  [^]z,  then  y  is 
Gaussian  over  the  (non-full-rank)  set  [  ^  ]  A/,  (G)  C  A„  (A)  with  parameter  [  ^  ]  /Xq  (i.e.,  covariance 
[  ^  ]  Eg  [  r*  1  ]).  The  (strict)  inclusion  holds  because  for  any  y  =  [  ^ ]  z  where  z  £  A^(G),  we  have 

Ay  =  (A[^])z  =  Gz  =  u. 

Note  that  si([^]  •  ^Sg)  <  si([^])  •  si(\/Sg)  <  \/si(R)2  +  1  •  ai(v'Sc),  so  y’s  distribution  is 
only  about  an  si(R)  factor  wider  than  that  of  z  over  A„  (G).  However,  y  lies  in  a  non-full-rank  subset 
of  A„  (A),  and  its  distribution  is  ‘skewed’  (non-spherical).  This  leaks  information  about  the  trapdoor 
R,  so  we  cannot  just  output  y . 

2.  To  sample  from  a  spherical  Gaussian  over  all  of  A„  ( A),  we  use  the  ‘convolution’  technique  from  [PeilO] 
to  correct  for  the  above-described  problems  with  the  distribution  of  y.  Specifically,  we  first  choose  a 
Gaussian  perturbation  p  £  Zm  having  covariance  s2  —  [  ^  ]  Eg  [  R*  1  ],  which  is  well-defined  as  long 
as  s  >  si  ( [  ^ ]  •  v/Eg)-  We  then  sample  y  =  [  ^ ]  z  as  above  for  an  adjusted  syndrome  v  =  u  —  Ap, 
and  output  x  =  p  +  y.  Now  the  support  of  x  is  all  of  A„  (A),  and  because  the  covariances  of  p  and  y 
arc  additive  (subject  to  some  mild  hypotheses),  the  overall  distribution  of  x  is  spherical  with  Gaussian 
parameter  s  that  can  be  as  small  as  s  ~  .sq  (R)  •  -si(\/Eg)- 

Quality  analysis.  Algorithm  3  can  sample  from  a  discrete  Gaussian  with  parameter  s  ■  u (\/log  n)  where 
s  can  be  as  small  as  \J si(R)2  +  1  •  \J$\  (Eg)  +  2.  We  stress  that  this  is  only  very  slightly  larger  —  a 
factor  of  at  most  ^6/4  <  1.23  —  than  the  bound  (si(R)  +  1)  •  ||S||  from  Lemma  5.3  on  the  largest 
Gram-Schmidt  norm  of  a  lattice  basis  derived  from  the  trapdoor  R.  (Recall  that  our  constructions  from 
Section  4  give  si(Eg)  =  ||S||2  =  4  or  5.)  In  the  iterative  “randomized  nearest-plane”  sampling  algorithm 
of  [KleOO,  GPV08],  the  Gaussian  parameter  s  is  lower-bounded  by  the  largest  Gram-Schmidt  norm  of  the 
orthogonalized  input  basis  (times  the  same  u(y/Yogn)  factor  used  in  our  algorithm).  Therefore,  the  efficiency 
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and  parallelism  of  Algorithm  3  comes  at  almost  no  cost  in  quality  versus  slower,  iterative  algorithms  that  use 
high-precision  arithmetic.  (It  seems  very  likely  that  the  corresponding  small  loss  in  security  can  easily  be 
mitigated  with  slightly  larger  parameters,  while  still  yielding  a  significant  net  gain  in  performance.) 

Runtime  analysis.  We  now  analyze  the  computational  cost  of  Algorithm  3,  with  a  focus  on  optimizing  the 
online  runtime  and  parallelism  (sometimes  at  the  expense  of  the  offline  phase,  which  we  do  not  attempt  to 
optimize). 

The  offline  phase  is  dominated  by  sampling  from  D^m  r  for  some  fixed  (typically  non- spherical) 
covariance  matrix  £  >  I.  By  [PeilO,  Theorem  3.1],  this  can  be  accomplished  (up  to  any  desired  statistical 
distance)  simply  by  sampling  a  continuous  Gaussian  Dr ygzy  with  sufficient  precision,  then  independently 
randomized-rounding  each  entry  of  the  sampled  vector  to  Z  using  Gaussian  parameter  r  >  T]e(Z). 

Naively,  the  online  work  is  dominated  by  the  computation  of  H-1(u  —  w)  and  Rz  (plus  the  call  to 
C9(v),  which  as  described  in  Section  4  requires  only  0(logcn)  work,  or  one  table  lookup,  by  each  of  n 
processors  in  parallel).  In  general,  the  first  computation  takes  0(n2)  scalar  multiplications  and  additions 
in  Z q,  while  the  latter  takes  ()(m  ■  tv),  which  is  typically  0(n2  log2  q).  (Obviously,  both  computations  are 
perfectly  parallel  izable.)  However,  the  special  form  of  z,  and  often  of  H,  allow  for  some  further  asymptotic 
and  practical  optimizations:  since  z  is  typically  produced  by  concatenating  n  independent  dimension-A: 
subvectors  that  are  sampled  offline,  we  can  precompute  much  of  Rz  by  pre-multiplying  each  subvector  by 
each  of  the  n  blocks  of  k  columns  in  R.  This  reduces  the  online  computation  of  Rz  to  the  summation  of  n 
dimension- m  vectors,  or  0(n 2  log  q)  scalar  additions  (and  no  multiplications)  in  7Lq.  As  for  multiplication  by 
H1,  in  some  applications  (like  GPV  signatures)  H  is  always  the  identity  I,  in  which  case  multiplication  is 
unnecessary;  in  all  other  applications  we  know  of,  H  actually  represents  multiplication  in  a  certain  extension 
field/ring  of  7Lq,  which  can  be  computed  in  0{n  log  n)  scalar  operations  and  depth  0(log  n).  In  conclusion, 
the  asymptotic  cost  of  the  online  phase  is  still  dominated  by  computing  Rz,  which  takes  0(n2)  work,  but  the 
hidden  constants  are  small  and  many  practical  speedups  are  possible. 

Theorem  5.5.  Algorithm  3  is  correct. 

To  prove  the  theorem  we  need  the  following  fact  about  products  of  Gaussian  functions. 

Fact  5.6  (Product  of  degenerate  Gaussians).  Let  £i,  £2  £  Mmxm  he  symmetric  positive  semidefinite  matrices, 
let  Vi  =  span(Ej)  for  i  =  1,2  and  V3  =  V\  D  V2,  let  P  =  P*  £  lmxm  be  the  symmetric  matrix  that  projects 
orthogonally  onto  V3,  and  let  ci,  C2  £  Rm  be  arbitrary.  Supposing  it  exists,  let  v  be  the  unique  point  in 
(Pi  +  ci)  H  (P2  +  C2)  n  V^~.  Then 

Pf Sl(x  -  Cl)  •  PpwM  -  C2)  =  P^TT+rsS Cl  -  c2)  •  /V s^(x  -  c3), 
where  £3  and  c3  £  v  +  V3  are  such  that 

£3+  =  P(£+  +  £+)P 
S3  (C3  -  v)  =  E| (Cl  -  v)  +  £2  (c2  -  v). 

Proof  of  Theorem  5.5.  We  adopt  the  notation  from  the  algorithm,  let  V  =  span(  [^  )  C  Rm,  let  P  be  the 
matrix  that  projects  orthogonally  onto  V,  and  define  the  lattice  A  =  Zm  (IV  =  £([?■]),  which  spans  V. 
We  analyze  the  output  distribution  of  SampleD.  Clearly,  it  always  outputs  an  element  of  (A),  so  let  x  £ 
A„  (A)  be  arbitrary.  Now  SampleD  outputs  x  exactly  when  it  chooses  in  Step  1  some  p  £  P  +  x,  followed  in 
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Algorithm  3  Efficient  algorithm  SampleDc’(R,  A,  H,  u,  s )  for  sampling  a  discrete  Gaussian  over  A„  (A). 
Input:  An  oracle  O(v)  for  Gaussian  sampling  over  a  desired  coset  Ay  (G)  with  fixed  parameter  ry/E g  > 
?ye(AJ-(G)),  for  some  Eg  >  2  and  e  <  1/2. 

Offline  phase: 

•  partial  parity-check  matrix  A  G  Z”xm; 

•  trapdoor  matrix  R  G  Z'nxi"; 

•  positive  definite  E  >  [^]  (2  +  Eg)[r‘  i],  e.g.,  any  E  =  s2  >  (si(R)2  +  1)(si(Eg)  +  2). 
Online  phase: 

•  invertible  tag  H  G  Z” xn  defining  A  =  [A  |  HG  —  AR]  €  Z” xm,  for  to  =  to  +  w 
(H  may  instead  be  provided  in  the  offline  phase,  if  it  is  known  then); 

•  syndrome  u6ZJ. 

Output:  A  vector  x  drawn  from  a  distribution  within  0(e)  statistical  distance  of  GA^A)  r,^. 

Offline  phase: 

1 :  Choose  a  fresh  perturbation  p  G-  D^m  where  EP  =  E—  [^]  Eg  [r‘i]  >  2[  ^[r'  i]. 

2:  Let  p  =  [  ]  for  pi  G  Zm,  p2  G  7LW ,  and  compute  w  =  A(pi  —  Rp2)  G  Z”  and  w  =  Gp2  G  Z”. 

Online  phase: 

3:  Let  v  <—  H_1(u  —  w)  —  w  =  H_1(u  —  Ap)  G  Z™,  and  choose  z  <—  DA±^  by  calling  0(v). 
4:  return  x  <—  p  +  [  ^  ]  z. 

Step  3  by  the  unique  z  G  A^r(G)  such  that  x  —  p  =  [  z.  It  is  easy  to  check  that  pyj^(z)  =  p^^(x  —  p), 
where 

sy  =  [?]  sg[r‘  i]  >  2[^][r‘  i] 

is  the  covariance  matrix  with  span(Ey)  =  V.  Note  that  Ep  +  Ey  =  E  by  definition  of  Ep,  and  that 
span(Ep)  =  Mm  because  Ep  >  0.  Therefore,  we  have  (where  C  denotes  a  normalizing  constant  that  may 
vary  from  line  to  line,  but  does  not  depend  on  x): 

px  =  Pr[SampleD  outputs  x] 

E  Dz™ry/%®)  •  DA^G),r^&  (def.  of  SampleD) 

pezmn(V+x) 

=  C  E  ^v^(p)  •  P  -  *)MvS^(Av  (G))  (def.  of  D) 

p 

=  C  ■  Pr^(x)  •  E^VSg(P  -  C3)/PrV^(Av  (G))  (Fact  5-6) 

p 

G  C[  1,  I^f]  •  prv/s(x)  •  P  -  c3)  (Lemma  2.5  and  ry/T^  >  r?e(A±(G))) 

p 

=  C[  1,  ■  Pr%^(x)  •  prV s^(Zm  n  (V  +  x)  -  c3),  (5.1) 

where  Eg“  =  P(E+  +  E+)P  and  c3  G  v  +  V  =  x  +  V,  because  the  component  of  x  orthogonal  to  V  is  the 
unique  point  v  G  (V  +  x)  n  V  .  Therefore, 

zm  n  (P  +  x)  -  c3  =  (zm  n  v)  +  (x  -  c3)  c  v 
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is  a  coset  of  the  lattice  A  =  £(  [  ).  It  remains  to  show  that  r^/E 3  >  tp  (A),  so  that  the  rightmost  term 

in  (5.1)  above  is  essentially  a  constant  (up  to  some  factor  in  1])  independent  of  x,  by  Lemma  2.5.  Then 
we  can  conclude  that  px  G  [  j^]  •  prv^(x),  from  which  the  theorem  follows. 

To  show  that  r\JYF>,  >  rje(A),  note  that  since  A*  C  V,  for  any  covariance  II  we  have  pPv/pf(A*)  = 
p^jj(A*),  and  so  P\/n  >  pe(A)  if  and  only  if  \/n  >  r/e( A).  Now  because  both  Ep,  Ey  >  2[^-]  [r*  i],  we 
have 

S+  +  S+<([*][iFi])+. 

Because  r  [  ^ ]  >  r)e( A)  for  e  =  negl(n)  by  Lemma  2.3,  we  have  r-y/S 3  =  r\J (Ep  +  Ey )+  >  pe( A),  as 
desired.  □ 

5.5  Trapdoor  Delegation 

Here  we  describe  very  simple  and  efficient  mechanism  for  securely  delegating  a  trapdoor  for  A  G  Z™xm 
to  a  trapdoor  for  an  extension  A'  G  inxm'  Qf  a.  Qur  method  has  several  advantages  over  the  previous 
basis  delegation  algorithm  of  [CHKP10]:  first  and  most  importantly,  the  size  of  the  delegated  trapdoor  grows 
only  linearly  with  the  dimension  m!  of  A-1  (A'),  rather  than  quadratically.  Second,  the  algorithm  is  much 
more  efficient,  because  it  does  not  require  testing  linear  independence  of  Gaussian  samples,  nor  computing 
the  expensive  ToBasis  and  Hermite  normal  form  operations.  Third,  the  resulting  trapdoor  R  has  a  ‘nice’ 
Gaussian  distribution  that  is  easy  to  analyze  and  may  be  useful  in  applications.  We  do  note  that  while  the 
delegation  algorithm  from  [CHKP10]  works  for  any  extension  A'  of  A  (including  A  itself),  ours  requires 
m!  >  m  +  w.  Fortunately,  this  is  frequently  the  case  in  applications  such  as  HIBE  and  others  that  use 
delegation. 


Algorithm  4  Efficient  algorithm  DelTrapc>(A/  =  [A  |  Ai],  H',  s')  for  delegating  a  trapdoor. 

Input:  an  oracle  O  for  discrete  Gaussian  sampling  over  cosets  of  A  =  AJ-(A)  with  parameter  s'  >  rje( A). 

•  parity-check  matrix  A'  =  [A  |  Ai]  G  Z”xm  x  ZqXW; 

•  invertible  matrix  H'  G  Z''xn; 

Output:  a  trapdoor  R'  G  Zmxw  for  A'  with  tag  H  G  Z"xn. 

1:  Using  O,  sample  each  column  of  R'  independently  from  a  discrete  Gaussian  with  parameter  s'  over  the 
appropriate  coset  of  AJ-(A),  so  that  AR'  =  H'G  —  Ai. 


Usually,  the  oracle  O  needed  by  Algorithm  4  would  be  implemented  (up  to  negl(n)  statistical  distance)  by 
Algorithm  3  above,  using  a  trapdoor  R  for  A  where  si(R)  is  sufficiently  small  relative  to  s'.  The  following 
is  immediate  from  Lemma  2.9  and  the  fact  that  the  columns  of  R'  arc  independent  and  negl(n)-subgaussian. 
A  relatively  tight  bound  on  the  hidden  constant  factor  can  also  be  derived  from  Lemma  2.9. 

Lemma  5.7.  For  any  valid  inputs  A'  and  H',  Algorithm  4  outputs  a  trapdoor  R  'for  A'  with  tag  H',  whose 
distribution  is  the  same  for  any  valid  implementation  ofO,  and  si(R')  <  s'  •  0{pjm  +  y/w)  except  with 
negligible  probability. 
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6  Applications 


The  main  applications  of  “strong”  trapdoors  have  included  digital  signature  schemes  in  both  the  random- 
oracle  and  standard  models,  encryption  secure  under  chosen-ciphertext  attack  (CCA),  and  (hierarchical) 
identity-based  encryption.  Here  we  focus  on  signature  schemes  and  CCA-secure  encryption,  where  our 
techniques  lead  to  significant  new  improvements  (beyond  what  is  obtained  by  plugging  in  our  trapdoor 
generator  as  a  “black  box”).  Where  appropriate,  we  also  briefly  mention  the  improvements  that  are  possible 
in  the  remaining  applications. 

6.1  Algebraic  Background 

In  our  applications  we  need  a  special  collection  of  elements  from  a  certain  ring  R„  which  induce  invertible 
matrices  H  £  Z“xn  as  required  by  our  trapdoor  construction.  We  construct  such  a  ring  using  ideas  from  the 
literature  on  secret  sharing  over  groups  and  modules,  e.g.,  [DF94,  Feh98].  Define  the  ring  R,  =  Z q[x\/(f(x )) 
for  some  monic  degree-n  polynomial  fix)  =  xn  +  /n_ i xn~  1  +  ■  ■  ■  +  fo  £  Z[x]  that  is  irreducible 
modulo  every  prime  p  dividing  q.  (Such  an  f(x)  can  be  constructed  by  finding  monic  irreducible  degree- 
n  polynomials  in  Zp  [.r]  for  each  prime  p  dividing  q,  and  using  the  Chinese  remainder  theorem  on  their 
coefficients  to  get  f(x).)  Recall  that  'R,  is  a  free  Zg-module  of  rank  n,  i.e.,  the  elements  of  7 Z  can  be 
represented  as  vectors  in  Z”  relative  to  the  standard  basis  of  monomials  1,  x, . . . ,  xn~l .  Multiplication  by 
any  fixed  element  of  7Z  then  acts  as  a  linear  transformation  on  Z"  according  to  the  rule  x  •  (oo, . . . ,  an_i  )>  = 
(0,  ao,  •  •  • ,  an- 2)*  —  an_i  (/o,  fi, . . . ,  fn- i)*,  and  so  can  be  represented  by  an  (efficiently  computable)  matrix 
in  Z”xn  relative  to  the  standard  basis.  In  other  words,  there  is  an  injective  ring  homomorphism  h :  7Z  — >  Z”xn 
that  maps  any  a  G  R,  to  the  matrix  H  =  h  (u)  representing  multiplication  by  a.  In  particular-,  H  is  invertible 
if  and  only  if  a  £  1Z* ,  the  set  of  units  in  7 Z.  By  the  Chinese  remainder  theorem,  and  because  Z p[x\/(f(x)) 
is  a  field  by  construction  of  f(x),  an  element  a  £  A.  is  a  unit  exactly  when  it  is  nonzero  (as  a  polynomial 
residue)  modulo  every  prime  p  dividing  q.  We  use  this  fact  quite  essentially  in  the  constructions  that  follow. 

6.2  Signature  Schemes 

6.2.1  Definitions 

A  signature  scheme  SIG  for  a  message  space  M.  (which  may  depend  on  the  security  parameter  n)  is  a  tuple 
of  PPT  algorithms  as  follows: 

•  Gen(ln)  outputs  a  verification  key  vk  and  a  signing  key  sk. 

•  Sign(s&:,  p),  given  a  signing  key  sk  and  a  message  p  £  M,  outputs  a  signature  a  £  {0, 1}*. 

•  Ver (vk,  p.  a),  given  a  verification  key  vk,  a  message  //,  and  a  signature  a,  either  accepts  or  rejects. 

The  correctness  requirement  is:  for  any  //  £  M. ,  generate  (vk,  sk)  <—  Gen(ln)  and  a  t—  Sign  (sk,  fi).  Then 
\/er(vk,  fi,  a)  should  accept  with  overwhelming  probability  (over  all  the  randomness  in  the  experiment). 

We  recall  two  standard  notions  of  security  for  signatures.  An  intermediate  notion  is  strong  unforge¬ 
ability  under  static  chosen-message  attack,  or  su-scma  security,  is  defined  as  follows:  first,  the  forger  T 
outputs  a  list  of  distinct  query  messages  f  k  1  ^ , . . . ,  p1^1  for  some  Q.  (The  distinctness  condition  simplifies 
our  construction,  and  does  not  affect  the  notion’s  usefulness.)  Next,  we  generate  ( vk,sk )  4-  Gen(ln) 
and  ijW  t—  Sign  (sk,  p^)  for  each  i  £  \Q\,  then  give  vk  and  each  <rb)  to  kF .  Finally,  T  outputs  an  at¬ 
tempted  forgery  ( p*,a *).  The  forger’s  advantage  Adv^^jJ7)  is  the  probability  that  Mer(vk,  p*,  a*) 
accepts  and  (p* ,  a*)  /  (//'■  .  a(/,>)  for  all  i  £  [Q],  taken  over  all  the  randomness  of  the  experiment.  The 
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scheme  is  su-scma-secure  if  Adv|“QCma(Jr)  =  negl(n)  for  every  nonuniform  probabilistic  polynomial-time 
algorithm  T . 

Another  notion,  called  strong  existential  unforgeability  under  adaptive  chosen-message  attack,  or  su-acma 
security,  is  defined  similarly,  except  that  T  is  first  given  vk  and  may  adaptively  choose  the  messages  p(l'1  to 
be  signed,  which  need  not  be  distinct. 

Using  a  family  of  chameleon  hash  functions,  there  is  a  generic  transformation  from  eu-scma-  to  eu-acma- 
security;  see,  e.g.,  [KROO].  Furthermore,  the  transformation  results  in  an  offline/online  scheme  in  which  the 
Sign  algorithm  can  be  precomputed  before  the  message  to  be  signed  is  known;  see  [ST01],  The  basic  idea 
is  that  the  signer  chameleon  hashes  the  true  message,  then  signs  the  hash  value  using  the  eu-scma-secure 
scheme  (and  includes  the  randomness  used  in  the  chameleon  hash  with  the  final  signature).  A  suitable  type  of 
chameleon  hash  function  has  been  constructed  under  a  weak  hardness-of-SIS  assumption;  see  [CHKP10]. 

6.2.2  Standard  Model  Scheme 

Here  we  give  a  signature  scheme  that  is  statically  secure  in  the  standard  model.  The  scheme  itself  is  essentially 
identical  (up  to  the  improved  and  generalized  parameters)  to  the  one  of  [Boy  10],  which  is  a  lattice  analogue  of 
the  pairing-based  signature  of  [Wat05].  We  give  a  new  proof  with  an  improved  security  reduction  that  relies 
on  a  weaker  assumption.  The  proof  uses  a  variant  of  the  “prefix  technique”  [HW09]  also  used  in  [CHKP10]. 

Our  scheme  involves  a  number  of  parameters.  For  simplicity,  we  give  some  exemplary  asymptotic  bounds 
here.  (Other  slight  trade-offs  among  the  parameters  are  possible,  and  more  precise  values  can  be  obtained 
using  the  more  exact  bounds  from  earlier  in  the  paper  and  the  material  below.)  In  what  follows,  cu(\/log  n) 
represents  a  fixed  function  that  asymptotically  grows  faster  than  \/log  n. 

i  Ge  jnxnk  js  a  gadget  matrix  for  large  enough  q  =  poly(n)  and  k  =  [logg]  =  0(log  n),  with  the 
ability  to  sample  from  cosets  of  AJ-(G)  with  Gaussian  parameter  0(1)  •  uj  (\/ log  n)  >  t)f  (A  ( G ) ) . 
(See  for  example  the  constructions  from  Section  4.) 

•  rh  =  0(nk )  and  V  =  so  that  (A,  AR)  is  negl(n)-far  from  uniform  for  A  Z™  xm  and 

R  <—  77,  and  m  =  m  +  2 nk  is  the  total  dimension  of  the  signatures. 

•  l  is  a  suitable  message  length  (see  below),  and  s  =  0{\J  £nk)  ■  cc(\/log  n)2  is  a  sufficiently  large 
Gaussian  parameter. 

The  legal  values  of  £  are  influenced  by  the  choice  of  q  and  n.  Our  security  proof  requires  a  special 
collection  of  units  in  the  ring  7 Z  =  Z9[x]/ (f{x))  as  constructed  in  Section  6.1  above.  We  need  a  sequence  of 
£  units  ui, . . . ,  ui  G  TV,  not  necessarily  distinct,  such  that  any  nontrivial  subset-sum  is  also  a  unit,  i.e.,  for 
any  nonempty  5  C  [(],  Ylies  u>  F  TZ*.  By  the  characterization  of  units  in  7 Z  described  in  Section  6.1,  letting 
p  be  the  smallest  prime  dividing  q,  we  can  allow  any  £  <  (p  —  1)  •  n  by  taking  p  —  1  copies  of  each  of  the 
monomials  x1  G  TZ*  for  i  =  0, . . . ,  n  —  1. 

The  signature  scheme  has  message  space  {0, 1}£,  and  is  defined  as  follows. 

•  Gen(ln):  choose  A  -t—  Z”xm,  choose  R  €  Zmxnfc  from  distribution  77,  and  let  A  =  [A  |  G  —  AR], 
For  i  =  0, 1, . . . ,  £,  choose  A,  Z” xnfc.  Also  choose  a  syndrome  u  Z” . 

The  public  verification  key  is  vk  =  (A,  Ao, . . . ,  Ag,  u).  The  secret  signing  key  is  sk  =  R. 

•  Sign(s/c,  p  G  {0, 1}£):  let  A/t  =  A  |  Ao  +  Ik  A,  G  Z” xm,  where  pt  G  {0, 1}  is  the  ith  bit 

of  p,  interpreted  as  an  integer.  Output  v  G  Zm  sampled  from  l(am),s,  using  SampleD  with  trapdoor 
R  for  A  (which  is  also  a  trapdoor  for  its  extension  A;/). 
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•  Ver (vk,  p ,  v):  let  be  as  above.  Accept  if  ||v||  <  s  ■  y/m  and  AM  •  v  =  u;  otherwise,  reject. 

Notice  that  the  signing  process  takes  0(£n2k)  scalar  operations  (to  add  up  the  A,s),  but  after  transforming 
the  scheme  to  a  fully  secure  one  using  chameleon  hashing,  these  computations  can  be  performed  offline 
before  the  message  is  known. 

Theorem  6.1.  There  exists  a  PPT  oracle  algorithm  (a  reduction)  S  attacking  the  SIS?j(g  problem  for  large 
enough  f3  =  0(l(nk)3/2)  ■  u(y/Togn )3  such  that,  for  any  adversary  F  mounting  an  su-scma  attack  on  SIG 
and  making  at  most  Q  queries, 

Advs,s ,qJST)  >  A&vsfIQCnm {F) / (2(1  -  1  )Q  +  2)  -  negl(n). 

Proof  Let  F  be  an  adversary  mounting  an  su-scma  attack  on  SIG,  having  advantage  6  =  Adv|jQCma(J7). 
We  construct  a  reduction  S  attacking  SIS The  reduction  S  takes  as  input  fh  +  nk  +  1  uniformly  random 
and  independent  samples  from  Z”,  parsing  them  as  a  matrix  A  =  [A  |  B]  €  <^nx{m+nk)  antj  Syntjrome 
u'  G  Z”.  It  will  use  F  either  to  find  some  z  G  Zm  of  length  ||z||  <  /3  —  1  such  that  Az  =  u'  (from  which  it 
follows  that  [A  |  u']  ■  z!  =  0,  where  z!  =  [  f\  ]  is  nonzero  and  of  length  at  most  /?),  or  a  nonzero  z  <E  Zm 
such  that  Az  =  0  (from  which  is  follows  that  [A  |  u']  •  [  §  ]  =  0). 

We  distinguish  between  two  types  of  forger  F:  one  that  produces  a  forgery  on  an  unqueried  message 
(a  violation  of  standard  existential  unforgeability),  and  one  that  produces  a  new  signature  on  a  queried 
message  (a  violation  of  strong  unforgeability).  Clearly  any  F  with  advantage  6  has  probability  at  least  <5/2  of 
succeeding  in  at  least  one  of  these  two  tasks. 

First  we  consider  F  that  forges  on  an  unqueried  message  (with  probability  at  least  S / 2).  Our  reduction  S 
simulates  the  static  chosen-message  attack  to  F  as  follows: 

•  Invoke  F  to  receive  up  to  Q  messages  p^\  P^2\  •  •  •  G  {0, 1  }£.  Compute  the  set  P  of  all  strings 
p  G  {0,  1 } having  the  property  that  p  is  a  shortest  string  for  which  no  p^P  has  p  as  a  prefix. 
Equivalently,  P  represents  the  set  of  maximal  subtrees  of  {0,  1 } :  '  (viewed  as  a  tree)  that  do  not 
contain  any  of  the  queried  messages.  The  set  P  has  size  at  most  (£  —  1)  •  Q  +  1,  and  may  be  computed 
efficiently.  (See,  e.g.,  [CHKP10]  for  a  precise  description  of  an  algorithm.)  Choose  some  p  from  P 
uniformly  at  random,  letting  t  =  \p\  <  l. 

•  Construct  a  verification  key  vk  =  (A,  Aq,  . . . ,  A^,  u  =  u7):  for  i  =  0, . . . ,  l,  choose  R;  V,  and  let 


A i  =  HjG  —  AR, ,  where  H, 


h{ 0)  =  0  i  >  t 

<  (-l)Pi  •  h(ui)  i  G  [t] 

.  -  Eje[i]  Pj  '  Hj  ^  =  o 


(Recall  that  u\, . . . ,  un  G  1Z  =  Z q[x\/(f(x))  ai-e  units  whose  nontrivial  subset-sums  ai-e  also  units.) 

Note  that  by  hypothesis  on  fh  and  V ,  for  any  choice  of  p  the  key  vk  is  only  negl(n)-far  from  uniform 
in  statistical  distance.  Note  also  that  by  our  choice  of  the  H,  ,  for  any  message  p  G  {0, 1  }e  having  p 
as  a  prefix,  we  have  Ho  +  X^e[£]  /T'H,  =  0.  Whereas  for  any  p  G  {0,  l}£  having  p'  f  p  as  its  f-bit 
prefix,  we  have 


ie[t]  ie[t] 


which  is  invertible  by  hypothesis  on  the  ms.  Finally,  observe  that  with  overwhelming  probability 
over  any  fixed  choice  of  vk  and  the  H,.  each  column  of  each  R,  is  still  independently  distributed  as 
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a  discrete  Gaussian  with  parameter  u;(\/log  n)  >  rje{ A)  over  some  fixed  coset  of  A-1  (A),  for  some 
negligible  e  =  e(n). 

•  Generate  signatures  for  the  queried  messages:  for  each  message  p  =  p(l\  compute 

=  [A  |  Ao  +  /tjAj]  =  [A  |  B  |  HG  —  A(Ro  +  fqRi)] , 
ie[t]  ie[H\ 

where  H  is  invertible  because  the  f-bit  prefix  of  p  is  not  p.  Therefore,  R  =  (Rq  +  rtRi)  is 

a  trapdoor  for  A^.  By  the  conditional  distribution  on  the  R,s,  concatenation  of  subgaussian  random 
variables,  and  Lemma  2.9,  we  have 

si(R)  =  \Jl+  1  •  +  Vnk)  ■  w(yJ\ogn)  =  0(V ink)  ■  ui(y/\ogn) 

with  overwhelming  probability.  Since  s  =  0(\/(nk)  ■  oj(\/\og  n)1  is  sufficiently  large,  we  can  generate 
a  properly  distributed  signature  v/7  <—  DA_ l(am),s  using  SampleD  with  trapdoor  R. 

Next,  S  gives  vk  and  the  generated  signatures  to  T .  Because  vk  and  the  signatures  are  distributed  within 
negl(ro)  statistical  distance  of  those  in  the  real  attack  (for  any  choice  of  the  prefix  p),  with  probability  at  least 
5/2  —  negl(n),  T  outputs  a  forgery  (//*,  v*)  where  p*  is  different  from  all  the  queried  messages,  A/(*  v*  =  u, 
and  ||  v*  ||  <  s  ■  y/m.  Furthermore,  conditioned  on  this  event,  p*  has  p  as  a  prefix  with  probability  at  least 
f/((£  —  1  )Q  +  f)  —  negl(n),  because  p  is  still  essentially  uniform  in  P  conditioned  on  the  view  of  T. 
Therefore,  all  of  these  events  occur  with  probability  at  least  5/{2{i—  1  )Q  +  2)  —  negl(n). 

In  such  a  case,  S  extracts  a  solution  to  its  SIS  challenge  instance  from  the  forgery  (p*,  v*)  as  follows. 
Because  p*  stalls  withp,  we  have  A =  [A  |  B  |  —  AR*]  for  R*  =  Rq  +  an<i  so 


[A  |  B] 


-R* 

*-nk 

1  V/  1 
Z 


u  mod  g, 


as  desired.  Because  ||v*||  <  s  •  y/rn  =  O(VInk)  •  u(^/\o gn  )2  and  si(R*)  =  \J l+\  ■  0[y[m  +  \fnk')  ■ 
u{\/\ogn)  with  overwhelming  probability  (conditioned  on  the  view  of  T  and  any  fixed  H  ),  we  have 
||z||  =  0{£{nk )3/2)  •  cu(Vlogn)3,  which  is  at  most  (5  —  1,  as  desired. 

Now  we  consider  an  T  that  forges  on  one  of  its  queried  messages  (with  probability  at  least  5/2).  Our 
reduction  S  simulates  the  attack  to  T  as  follows: 

•  Invoke  T  to  receive  up  to  Q  distinct  messages  pl  l  K  p{2\  . . .  6  {(),  1}£.  Choose  one  of  these  messages 
p  =  j/1'1  uniformly  at  random,  “guessing”  that  the  eventual  forgery  will  be  on  p. 

•  Construct  a  verification  key  vk  =  (A,  Ao, . . . ,  A/,  u):  generate  A;  exactly  as  above,  using  p  =  p. 
Then  choose  v  Dim  s  and  let  u  =  A^v,  where  Aai  is  defined  in  the  usual  way. 

•  Generate  signatures  for  the  queried  messages:  for  all  the  queries  except  p ,  proceed  exactly  as  above 
(which  is  possible  because  all  the  queries  are  distinct  and  hence  do  not  have  p  =  p  as  a  prefix).  For  p, 
use  v  as  the  signature,  which  has  the  required  distribution  T>a-l(am),s  by  construction. 

When  S  gives  vk  and  the  signatures  to  P,  with  probability  at  least  5/2  —  negl(n)  the  forger  must  output  a 
forgery  (p* ,  v* )  where  p*  is  one  of  its  queries,  v*  is  different  from  the  corresponding  signature  it  received, 
A/t*  v*  =  u,  and  ||v*||  <  s  ■  \Jm.  Because  vk  and  the  signatures  are  appropriately  distributed  for  any 
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choice  p  that  S  made,  conditioned  on  the  above  event  the  probability  that  p*  =  p  is  at  least  1/Q  —  negl(n). 
Therefore,  all  of  these  events  occur  with  probability  at  least  5/(2Q)  —  negl(n). 

In  such  a  case,  S  extracts  a  solution  to  its  SIS  challenge  from  the  forgery  as  follows.  Because  p*  =  p,  we 
B  |  -  AR*]  for  R*  =  R0  +  YUe\e\  and  so 


have  Am*  =  A 


[A|B] 


-R* 


*-nk 


(v*  —  v)  =  0  mod  q. 


Because  both  ||v*||,  ||v||  <  s  ■  sfrn  =  O(VInk)  ■  tc(Vlogn)2  and  si(R*)  =  O(VInk)  ■  uj(y/Togn )  with 
overwhelming  probability  (conditioned  on  the  view  of  T  and  any  fixed  H,),  we  have  ||z||  =  0(£(nk)3^2)  • 
ic(\/logn)3  with  overwhelming  probability,  as  needed.  It  just  remains  to  show  that  z/0  with  overwhelming 
probability.  To  see  this,  write  w  =  v*  —  v  =  (wy,  W2,  W3)  €  Zm  x  7Lnk  x  Znk,  with  w  A  0.  If  /  0  or 
W3  =  0,  then  z  /  0  and  we  arc  done.  Otherwise,  choose  some  entry  of  W3  that  is  nonzero;  without  loss  of 
generality  say  it  is  vum.  Let  r  =  (Ro)nfc-  Now  for  any  fixed  values  of  R,  for  i  G  \i]  and  fixed  first  nk  —  1 
columns  of  Ro,  we  have  z  =  0  only  if  r  •  wrn  =  y  6  Mm  for  some  fixed  y.  Conditioned  on  the  adversary’s 
view  (specifically,  (Af,  )r(/,  =  Ar),  r  is  distributed  as  a  discrete  Gaussian  of  parameter  >  2//f(A  ( A) )  for 
some  e  =  negl(n)  over  a  coset  of  A3- (A).  Then  by  Lemma  2.7,  we  have  r  =  y/ium  with  only 
probability,  and  we  arc  done.  □ 


6.3  Chosen  Ciphertext-Secure  Encryption 

Definitions.  A  public -key  cryptosystem  for  a  message  space  At  (which  may  depend  on  the  security 
parameter)  is  a  tuple  of  algorithms  as  follows: 

•  Gen(ln)  outputs  a  public  encryption  key  pk  and  a  secret  decryption  key  sk. 

•  En c(pk,  m),  given  a  public  key  pk  and  a  message  rn  G  At,  outputs  a  ciphertext  c  e{0,i}*- 

•  Decf.sA',  c),  given  a  decryption  key  sk  and  a  ciphertext  c,  outputs  some  m  €  MU  {_L}. 

The  correctness  requirement  is:  for  any  m  G  M.,  generate  ( pk ,  sk)  <—  Gen(ln)  and  c  G-  Eric (/>/.:,  in).  Then 
Dec  (sk,  c)  should  output  m  with  overwhelming  probability  (over  all  the  randomness  in  the  experiment). 

We  recall  the  two  notions  of  security  under  chosen-ciphertext  attacks.  We  start  with  the  weaker  notion 
of  CCA1  (or  “lunchtime”)  security.  Let  A  be  any  nonuniform  probabilistic  polynomial-time  algorithm. 
First,  we  generate  ( pk ,  sk)  4—  Gen(ln)  and  give  pk  to  A.  Next,  we  give  A  oracle  access  to  the  decryption 
procedure  Dec(s/c,  •).  Next,  A  outputs  two  messages  mo,  mi  G  At  and  is  given  a  challenge  ciphertext 
c  <—  Enc (pk,  mb)  for  either  b  =  0  or  b  =  1.  The  scheme  is  CCAl-secure  if  the  views  of  A  (i.e.,  the  public 
key  pk,  the  answers  to  its  oracle  queries,  and  the  ciphertext  c)  for  6  =  0  versus  6=1  are  computationally 
indistinguishable  (i.e.,  A's  acceptance  probabilities  for  6  =  0  versus  6  =  1  differ  by  only  negl(ra)).  In  the 
stronger  CCA2  notion,  after  receiving  the  challenge  ciphertext,  A  continues  to  have  access  to  the  decryption 
oracle  Dec  (sk,  ■)  for  any  query  not  equal  to  the  challenge  ciphertext  c;  security  it  defined  similarly. 


Construction.  To  highlight  the  main  new  ideas,  here  we  present  a  public -key  encryption  scheme  that 
is  CCAl-secure.  Full  CCA2  security  can  be  obtained  via  relatively  generic  transformations  using  either 
strongly  unforgeable  one-time  signatures  [DDN00],  or  a  message  authentication  code  and  weak  form  of 
commitment  [BCHK07] ;  we  omit  these  details. 

Our  scheme  involves  a  number  of  parameters,  for  which  we  give  some  exemplary  asymptotic  bounds.  In 
what  follows,  cu(\/log  n)  represents  a  fixed  function  that  asymptotically  grows  faster  than  -/log  n. 
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•  G  £  Zqxnk  is  a  gadget  matrix  for  large  enough  prime  power  q  =  pe  =  poly(n)  and  k  =  0(log  q)  = 
0(log  n).  We  require  an  oracle  O  that  solves  LWE  with  respect  to  A(Gt)  for  any  error  vector  in  some 
V\/2(q  •  B_t)  where  ||B||  =  0(f).  (See  for  example  the  constructions  from  Section  4.) 

•  fh  =  0(nk )  and  V  =  so  that  (A,  AR)  is  negl(n)-far  from  uniform  for  A  G-  Z” xm  and 

Rt-D,  and  m  =  fh  +  nk  is  the  total  dimension  of  the  public  key  and  ciphertext. 

•  cr  is  an  error  rate  for  LWE,  for  sufficiently  large  1/a  =  0{nk )  •  w(y/Iog  n). 

Our  scheme  requires  a  special  collection  of  elements  in  the  ring  7 Z  =  Z q[x\/(f(x))  as  constructed  in 
Section  6.1  (recall  that  here  q  =  pe).  We  need  a  very  large  set  U  =  { u ] , . . . ,  «/}  C  7 Z  with  the  “unit 
differences”  property:  for  any  i  /  j,  the  difference  Ui  —  Uj  G  7£*,  and  hence  h(ui  —  u3 )  =  h(uf)  —  h{uj)  G 
Zg  xn  is  invertible.  (Note  that  the  u,s  need  not  all  be  units  themselves.)  Concretely,  by  the  characterization 
of  units  in  72  given  above,  we  take  U  to  be  all  linear  combinations  of  the  monomials  1 .  x. . . . .  xn~ 1  with 
coefficients  in  {0, ...  ,p  —  1},  of  which  there  are  exactly  p".  Since  the  difference  between  any  two  such 
distinct  elements  is  nonzero  modulo  p,  it  is  a  unit. 

The  system  has  message  space  {0,  l}nk,  which  we  map  bijectively  to  the  cosets  of  A/2A  for  A  =  A(G4) 
via  some  function  encode  that  is  efficient  to  evaluate  and  invert.  Concretely,  letting  S  G  i,nkxnk  be  any  basis 
of  A,  we  can  map  m  e  {0,  l}n/"'  to  encode(m)  =  Sm  G  7Lnk . 

•  Gen(ln):  choose  A  •(—  Z”xm  and  R  V,  letting  Ai  =  —  AR  mod  q.  The  public  key  is  pk  =  A  = 
[A  |  Ai]  G  Z”xm  and  the  secret  key  is  sk  =  R. 

•  En c(pk  =  [A  |  Ai],m  G  {0,  l}nfc):  choose  nonzero  u  •(—  U  and  let  Au  =  [A  |  Ai  +  h(u)G\. 

Choose  s  G-  Z”,  e  D™aq,  and  ei  D%ks  where  s 2  =  (||e||2  +  m(aq)2)  ■  tc(v/Iog n)2. 

Let 

b*  =  2(stAu  mod  q)  +  e*  +  (0,  encode(m))t  mod  2 q, 

where  e  =  (e,  ei)  G  Zm  and  0  has  dimension  fh.  (Note  the  use  of  mod- 2c/  arithmetic:  2(stAu  mod  q) 
is  an  element  of  the  lattice  2A(A^)  D  2gZm.)  Output  the  ciphertext  c  =  (u,  b)  G  U  x  Z^. 

•  Dec(s/c  =  R,  c  =  (it,  b)  G  U  x  Z^):  Let  Au  =  [A  |  Ai  +  h(u) G]  =  [A  |  h(u) G  —  AR], 

1.  If  c  does  not  parse  or  u  =  0,  output  A.  Otherwise,  call  lnvertc>(R,  Au,  b  mod  q)  to  get  values 
z  G  Z”  and  e  =  (e,  ei)  G  Zm  x  7Lnk  for  which  bf  =  zfAu  +  e4  mod  q.  (Note  that  h(u)  G  Z” xn 
is  invertible,  as  required  by  Invert.)  If  the  call  to  Invert  fails  for  any  reason,  output  A. 

2.  If  ||e||  >  aq\fm  or  ||ei||  >  aq\/2mnk  ■  w(\/logn),  output  A. 

3.  Let  v  =  b  —  e  mod  2 q,  parsed  as  v  =  (v,  vi)  G  Z^  x  Z^.  If  v  0  2A(A*),  output  A.  Finally, 
output  encode-1  (v4  [  ^]  mod  2 q)  G  {0,  l}nk  if  it  exists,  otherwise  output  A. 

(In  practice,  to  avoid  timing  attacks  one  would  perform  all  of  the  Dec  operations  first,  and  only  then 
finally  output  A  if  any  of  the  validity  tests  failed.) 

Lemma  6.2.  The  above  scheme  has  only  2~  ^"7  probability  of  decryption  error. 

The  error  probability  can  be  made  zero  by  changing  Gen  and  Enc  so  that  they  resample  R,  e,  and/or  ei 
in  the  rare  event  that  they  violate  the  corresponding  bounds  given  in  the  proof  below. 
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Proof.  Let  (A,  R)  •(—  Gen(ln).  By  Lemma  2.9,  we  have  si(R)  <  0(yfnk)  ■  w (i/log  n)  except  with 
probability  Now  consider  the  random  choices  made  by  Enc(A,  m)  for  arbitrary  m  G  {0,  l}nk. 

By  Lemma  2.6,  we  have  both  ||e||  <  aq\fih  and  ||ei|j  <  aq\j2mnk  ■  ut(y/log n),  except  with  probability 
2 -rt(n)'  Lepjng  e  =  (§,  ei),  we  have 

||e* [^]  ||  <  ||e*R||  +  || ei ||  <  aq  ■  0(nk)  ■  u  (s/log  n). 

In  particular,  for  large  enough  1/a  =  0(nk )  •  w’(\/log  n)  we  have  c1  [  ^  ]  6  V\j‘Aq  •  B  t).  Therefore,  the 
call  to  Invert  made  by  Dec(R,  (u,  b))  returns  e.  It  follows  that  for  v  =  (v,  vi)  =  b  —  e  mod  2 q,  we  have 
v  G  2A(A*)  as  needed.  Finally, 

vf[*]  =  2(sth(u)G  mod  q)  +  encode(m)  mod  2  q, 

which  is  in  the  coset  encode(m)  G  A(Gf)/2A(G4),  and  so  Dec  outputs  m  as  desired.  □ 

Theorem  6.3.  The  above  scheme  is  CCA1 -secure  assuming  the  hardness  of  decision-l\NEqai  for  a'  = 
a/3  >  2  \fn/q. 

Proof  We  start  by  giving  a  particular  form  of  discretized  LWE  that  we  will  need  below.  Given  access  to  an 
LWE  distribution  ,4S(y  over  Z”  x  T  for  any  sGZ*  (where  recall  that  T  =  M/Z),  by  [Pei  10,  Theorem  3.1] 
we  can  transform  its  samples  (a,  b  =  (s,  a )/q  +  e  mod  1)  to  have  the  form  (a,  2((s,  a)  mod  q)  +  e'  mod  2 q) 
for  e!  G-  Dz,aq.  by  mapping  b  i-g  2 qb  +  Dz~2qb,s  mod  2 q  where  s 2  =  (aq)2  —  (2a' q)2  >  4 n  >  r/e(Z)2. 
This  transformation  maps  the  uniform  distribution  over  Z”  x  T  to  the  uniform  distribution  over  Z”  x  Z29,  so 
the  discretized  distribution  is  pseudorandom  under  the  hypothesis  of  the  theorem. 

We  proceed  via  a  sequence  of  hybrid  games.  The  game  //()  is  exactly  the  CCA1  attack  with  the  system 
described  above. 

In  game  H\ ,  we  change  how  the  public  key  A  and  challenge  ciphertext  c*  =  (it*,  b*)  are  constructed,  and 
the  way  that  decryption  queries  arc  answered  (slightly),  but  in  a  way  that  introduces  only  negl(n)  statistical 
difference  with  Hq.  At  the  start  of  the  experiment  we  choose  nonzero  u*  U  and  let  the  public  key  be 
A  =  [A  |  Ai]  =  [A  |  —  h(u*)G  —  AR],  where  A  and  R  are  chosen  in  the  same  way  as  in  Hq.  (In 
particular,  we  still  have  si(R)  <  0(Vnk)  ■  w(\/logn)  with  overwhelming  probability.)  Note  that  A  is  still 
negl(/t) -uniform  for  any  choice  of  u* ,  so  conditioned  on  any  fixed  choice  of  A,  the  value  of  u*  is  statistically 
hidden  from  the  attacker.  To  aid  with  decryption  queries,  we  also  choose  an  arbitrary  (not  necessarily  short) 
R  G  Zmxnfc  such  that  Ai  =  -AR  mod  q. 

To  answer  a  decryption  query  on  a  ciphertext  (u,  b),  we  use  an  algorithm  very  similar  to  Dec  with 
trapdoor  R.  After  testing  whether  u  =  0  (and  outputting  _L  if  so),  we  call  lnvertc>(R,  A„.  b  mod  q)  to  get 
some  zGZJ  and  e  G  Zm,  where 

Au  =  [A  |  Ai  +  h(u) G]  =  [A  |  h(u  -  u*)G  -  AR]. 

(If  Invert  fails,  we  output  X.)  We  then  perform  steps  2  and  3  on  e  G  Zm  and  v  =  b  —  e  mod  2 q  exactly  as 
in  Dec,  except  that  we  use  R  in  place  of  R  when  decoding  the  message  in  step  3. 

We  now  analyze  the  behavior  of  this  decryption  routine.  Whenever  u  u* ,  which  is  the  case  with 
overwhelming  probability  because  u*  is  statistically  hidden,  by  the  “unit  differences”  property  on  U  we  have 
that  h(u  —  v* )  G  Z/Xn  is  invertible,  as  required  by  the  call  to  Invert.  Now,  either  there  exists  an  e  that 
satisfies  the  validity  tests  in  step  2  and  such  that  b1  =  z'  A,,  +  e*  mod  q  for  some  z  G  Z”,  or  there  does  not. 
In  the  latter  case,  no  matter  what  Invert  does  in  Hq  and  II \ ,  step  2  will  return  X  in  both  games.  Now  consider 
the  former  case:  by  the  constraints  on  e,  we  have  e1  [  ^  ]  G  V i /9 ( q  ■  B  ~/)  in  both  games,  so  the  call  to  Invert 
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must  return  this  e  (but  possibly  different  z)  in  both  games.  Finally,  the  result  of  decryption  is  the  same  in 
both  games:  if  v  €  2A(At)  (otherwise,  both  games  return  _L),  then  we  can  express  v  as 

v*  =  2(s* Au  mod  q)  +  (0,  v7)*  mod  2 q 

for  some  s  G  Z”  and  V  G  Z ^ .  Then  for  any  solution  R  G  Zmxnfe  to  Ai  =  —  AR  mod  (7,  we  have 

v7  [^]  =  2(sth(u)G  mod  q)  +  (v7)*  mod  2q. 

In  particular,  this  holds  for  the  R  in  Hq  and  the  R  in  H  \  that  are  used  for  decryption.  It  follows  that  both 
games  output  encode-1  (v7),  if  it  exists  (and  _L  otherwise). 

Finally,  in  H\  we  produce  the  challenge  ciphertext  {a.  b)  on  a  message  m  G  {0,  l}nfc  as  follows.  Let 
u  =  u* ,  and  choose  s  G-  Z”  and  e  4—  D™aq  as  usual,  but  do  not  choose  ei.  Note  that  A„  =  [A  |  —  AR]. 
Let  b7  =  2(s* A  mod  q)  +  e*  mod  2 q.  Let 

b^  =  — b7R  +  e*  +  encode(m)  mod  2 q, 

where  e  4—  u(^/\og nY  ant^  outPut  (u-  b  =  (b,  bi)).  We  now  show  that  the  distribution  of  (u,  b) 

is  within  negl(n)  statistical  distance  of  that  in  Hq,  given  the  attacker’s  view  (i.e.,  pk  and  the  results  of 
the  decryption  queries).  Clearly,  u  and  b  have  essentially  the  same  distribution  as  in  Hq,  because  u  is 
negl(ro)-uniform  given  pk,  and  by  construction  of  b.  By  substitution,  we  have 

b^  =  2(st(-AR)  mod  q)  +  (e*R  +  e1)  +  encode(m.). 

Therefore,  it  suffices  to  show  that  for  fixed  e,  each  (e,  rt)  +  e,  has  distribution  negl(n)-far  from  Dz,s,  where 
s2  =  ( 1 1 c  1 1  “  +  'ni(aq)2)  ■  u>{yJ\ogn)2 ,  over  the  random  choice  of  r,  (conditioned  on  the  value  of  Ar,;  from 
the  public  key)  and  of  e*.  Because  each  r;  is  an  independent  discrete  Gaussian  over  a  coset  of  A  (A),  the 
claim  follows  essentially  by  [Reg05,  Corollary  3.10],  but  adapted  to  discrete  random  variables  using  [PeilO, 
Theorem  3.1]  in  place  of  [Reg05,  Claim  3.9]. 

In  game  H-2,  we  only  change  how  the  b  component  of  the  challenge  ciphertext  is  created,  letting  it  be 
uniformly  random  in  Z^.  We  construct  pk,  answer  decryption  queries,  and  construct  bi  in  exactly  the 
same  way  as  in  H\ .  First  observe  that  under  our  (discretized)  LWE  hardness  assumption,  games  H \  and 
M2  are  computationally  indistinguishable  by  an  elementary  reduction:  given  (A,  b)  G  Z”xm  x  Zi)"  where 
A  is  uniformly  random  and  either  b*  =  2(sf  A  mod  q)  +  e1  mod  2 q  (for  s  Z”  and  e  4—  D™aq)  or  b 
is  uniformly  random,  we  can  efficiently  emulate  either  game  H\  or  H>  (respectively)  by  doing  everything 
exactly  as  in  the  two  games,  except  using  the  given  A  and  b  when  constructing  the  public  key  and  challenge 
ciphertext. 

Now  by  the  leftover  hash  lemma,  (A,  h1 .  AR,  — b*R)  is  negl(n)-uniform  when  R  is  chosen  as  in  /L. 
Therefore,  the  challenge  ciphertext  has  the  same  distribution  (up  to  negl(n)  statistical  distance)  for  any 
encrypted  message,  and  so  the  adversary’s  advantage  is  negligible.  This  completes  the  proof.  □ 
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Abstract 

We  describe  a  working  implementation  of  leveled  homomorphic  encryption  (without  bootstrapping) 
that  can  evaluate  the  AES- 128  circuit  in  three  different  ways.  One  variant  takes  under  over  36  hours  to 
evaluate  an  entire  AES  encryption  operation,  using  NTL  (over  GMP)  as  our  underlying  software  plat¬ 
form,  and  running  on  a  large-memory  machine.  Using  SIMD  techniques,  we  can  process  over  54  blocks 
in  each  evaluation,  yielding  an  amortized  rate  of  just  under  40  minutes  per  block.  Another  implemen¬ 
tation  takes  just  over  two  and  a  half  days  to  evaluate  the  AES  operation,  but  can  process  720  blocks  in 
each  evaluation,  yielding  an  amortized  rate  of  just  over  five  minutes  per  block.  We  also  detail  a  third 
implementation,  which  theoretically  could  yield  even  better  amortized  complexity,  but  in  practice  turns 
out  to  be  less  competitive. 

For  our  implementations  we  develop  both  AES-specific  optimizations  as  well  as  several  “generic” 
tools  for  FHE  evaluation.  These  last  tools  include  (among  others)  a  different  variant  of  the  Brakerski- 
Vaikuntanathan  key-switching  technique  that  does  not  require  reducing  the  norm  of  the  ciphertext  vector, 
and  a  method  of  implementing  the  Brakerski-Gentry-Vaikuntanathan  modulus-switching  transformation 
on  ciphertexts  in  CRT  representation. 
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1  Introduction 


In  his  breakthrough  result  [13],  Gentry  demonstrated  that  fully-homomorphic  encryption  was  theoreti¬ 
cally  possible,  assuming  the  hardness  of  some  problems  in  integer  lattices.  Since  then,  many  different 
improvements  have  been  made,  for  example  authors  have  proposed  new  valiants,  improved  efficiency, 
suggested  other  hardness  assumptions,  etc.  Some  of  these  works  were  accompanied  by  implementation 
[26,  14,  8,  27,  19,  9],  but  all  the  implementations  so  far  were  either  “proofs  of  concept”  that  can  compute 
only  one  basic  operation  at  a  time  (at  great  cost),  or  special-purpose  implementations  limited  to  evaluat¬ 
ing  very  simple  functions.  In  this  work  we  report  on  the  first  implementation  powerful  enough  to  support 
an  “interesting  real  world  circuit”.  Specifically,  we  implemented  a  valiant  of  the  leveled  FHE-without- 
bootstrapping  scheme  of  Brakerski,  Gentry,  and  Vaikuntanathan  [5]  (BGV),  with  support  for  deep  enough 
circuits  so  that  we  can  evaluate  an  entire  AES-128  encryption  operation. 

Why  AES?  We  chose  to  shoot  for  an  evaluation  of  AES  since  it  seems  like  a  natural  benchmark:  AES  is 
widely  deployed  and  used  extensively  in  security-aware  applications  (so  it  is  “practically  relevant”  to  imple¬ 
ment  it),  and  the  AES  circuit  is  nontrivial  on  one  hand,  but  on  the  other  hand  not  astronomical.  Moreover  the 
AES  circuit  has  a  regular  (and  quite  “algebraic”)  structure  ,  which  is  amenable  to  parallelism  and  optimiza¬ 
tions.  Indeed,  for  these  same  reasons  AES  is  often  used  as  a  bench  mark  for  implementations  of  protocols  for 
secure  multi-party  computation  (MPC),  for  example  [24,  10,  17,  18].  Using  the  same  yardstick  to  measure 
FHE  and  MPC  protocols  is  quite  natural,  since  these  techniques  target  similar  application  domains  and  in 
some  cases  both  techniques  can  be  used  to  solve  the  same  problem. 

Beyond  being  a  natural  benchmark,  homomorphic  evaluation  of  AES  decryption  also  has  interesting 
applications:  When  data  is  encrypted  under  AES  and  we  want  to  compute  on  that  data,  then  homomorphic 
AES  decryption  would  transform  this  AES-encrypted  data  into  an  FHE-encrypted  data,  and  then  we  could 
perform  whatever  computation  we  wanted.  (Such  applications  were  alluded  to  in  [19,  27,  6]). 

Why  BGV?  Our  implementation  is  based  on  the  (ring-LWE-based)  BGV  cryptosystem  [5],  which  at 
present  is  one  of  three  valiants  that  seem  the  most  likely  to  yield  “somewhat  practical”  homomorphic  en¬ 
cryption.  The  other  two  are  the  NTRU-like  cryptosystem  of  Lopez- Alt  et  al.  [21]  and  the  ring-LWE-based 
fixed-modulus  cryptosystem  of  Brakerski  [4],  (These  two  valiants  were  not  yet  available  when  we  started 
our  implementation  effort.)  These  three  different  valiants  offer  somewhat  different  implementation  trade¬ 
offs,  but  they  all  have  similar  performance  characteristics.  At  present  we  do  not  know  which  of  them  will 
end  up  being  faster  in  practice,  but  the  differences  arc  unlikely  to  be  very  significant.  Moreover,  we  note 
that  most  of  our  optimizations  for  BGV  arc  useful  also  for  the  other  two  valiants. 

Our  Contributions.  Our  implementation  is  based  on  a  valiant  of  the  BGV  scheme  [5,  7,  6]  (based  on 
ring-LWE  [22]),  using  the  techniques  of  Smart  and  Vercauteren  (SV)  [27]  and  Gentry,  Halevi  and  Smart 
(GHS)  [15],  and  we  introduce  many  new  optimizations.  Some  of  our  optimizations  arc  specific  to  AES, 
these  arc  described  in  Section  4.  Most  of  our  optimization,  however,  arc  more  general-purpose  and  can  be 
used  for  homomorphic  evaluation  of  other  circuits,  these  arc  described  in  Section  3. 

Many  of  our  general-purpose  optimizations  arc  aimed  at  reducing  the  number  of  FFTs  and  CRTs  that 
we  need  to  perform,  by  reducing  the  number  of  times  that  we  need  to  convert  polynomials  between  coef¬ 
ficient  and  evaluation  representations.  Since  the  cryptosystem  is  defined  over  a  polynomial  ring,  many  of 
the  operations  involve  various  manipulation  of  integer  polynomials,  such  as  modular  multiplications  and 
additions  and  Frobenius  maps.  Most  of  these  operations  can  be  performed  more  efficiently  in  evaluation 
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representation,  when  a  polynomial  is  represented  by  the  vector  of  values  that  it  assumes  in  all  the  roots  of 
the  ring  polynomial  (for  example  polynomial  multiplication  is  just  point-wise  multiplication  of  the  evalu¬ 
ation  values).  On  the  other  hand  some  operations  in  BGV-type  cryptosystems  (such  as  key  switching  and 
modulus  switching)  seem  to  require  coefficient  representation,  where  a  polynomial  is  represented  by  listing 
all  its  coefficients.1  Hence  a  “naive  implementation”  of  FHE  would  need  to  convert  the  polynomials  back 
and  forth  between  the  two  representations,  and  these  conversions  turn  out  to  be  the  most  time-consuming 
part  of  the  execution.  In  our  implementation  we  keep  ciphertexts  in  evaluation  representation  at  all  times, 
converting  to  coefficient  representation  only  when  needed  for  some  operation,  and  then  converting  back. 

We  describe  valiants  of  key  switching  and  modulus  switching  that  can  be  implemented  while  keeping 
almost  all  the  polynomials  in  evaluation  representation.  Our  key-switching  valiant  has  another  advantage, 
in  that  it  significantly  reduces  the  size  of  the  key-switching  matrices  in  the  public  key.  This  is  particularly 
important  since  the  main  limiting  factor  for  evaluating  deep  circuits  turns  out  to  be  the  ability  to  keep  the 
key-switching  matrices  in  memory.  Other  optimizations  that  we  present  arc  meant  to  reduce  the  number 
of  modulus  switching  and  key  switching  operations  that  we  need  to  do.  This  is  done  by  tweaking  some 
operations  (such  as  multiplication  by  constant)  to  get  a  slower  noise  increase,  by  “batching”  some  operations 
before  applying  key  switching,  and  by  attaching  to  each  ciphertext  an  estimate  of  the  “noisiness”  of  this 
ciphertext,  in  order  to  support  better  noise  bookkeeping. 

Our  Implementation.  Our  implementation  was  based  on  the  NTL  C++  library  running  over  GMP,  we 
utilized  a  machine  which  consisted  of  a  processing  unit  of  Intel  Xeon  CPUs  running  at  2.0  GHz  with  18MB 
cache,  and  most  importantly  with  256GB  of  RAM.2 

Memory  was  our  main  limiting  factor  in  the  implementation.  With  this  machine  it  took  us  just  under 
two  days  to  compute  a  single  block  AES  encryption  using  an  implementation  choice  which  minimizes 
the  amount  of  memory  required;  this  is  roughly  two  orders  of  magnitude  faster  than  what  could  be  done 
with  the  Gentry-Halevi  implementation  [14].  The  computation  was  performed  on  ciphertexts  that  could 
hold  864  plaintext  slots  each;  where  each  slot  holds  an  element  of  F2s.  This  means  that  we  can  compute 
[864/16J  =  54  AES  operations  in  parallel,  which  gives  an  amortize  time  per  block  of  roughly  forty  minutes. 
A  second  (byte-sliced)  implementation,  requiring  more  memory,  completed  an  AES  operation  in  around  five 
days;  where  ciphertexts  could  hold  720  different  F2s  slots  (hence  we  can  evaluate  720  blocks  in  parallel). 
This  results  in  an  amortized  time  per  block  of  roughly  five  minutes. 

We  note  that  there  are  a  multitude  of  optimizations  that  one  can  perform  on  our  basic  implementation. 
Most  importantly,  we  believe  that  by  using  the  “bootstrapping  as  optimization”  technique  from  BGV  [5]  we 
can  speedup  the  AES  performance  by  an  additional  order  of  magnitude.  Also,  there  are  great  gains  to  be 
had  by  making  better  use  of  parallelism:  Unfortunately,  the  NTL  library  (which  serves  as  our  underlying 
software  platform)  is  not  thread  safe,  which  severely  limits  our  ability  to  utilize  the  multi-core  functionality 
of  modern  processors  (our  test  machine  has  24  cores).  We  expect  that  by  utilizing  many  threads  we  can 
speed  up  some  of  our  (higher  memory)  AES  valiants  by  as  much  as  a  16x  factor;  just  by  letting  each  thread 
compute  a  different  S-box  lookup. 

Organization.  In  Section  2  we  review  the  main  features  of  BGV-type  cryptosystems  [6,  5],  and  briefly 
survey  the  techniques  for  homomorphic  computation  on  packed  ciphertexts  from  SV  and  GHS  [27,  15]. 

1  The  need  for  coefficient  representation  ultimately  stems  from  the  fact  that  the  noise  in  the  ciphertexts  is  small  in  coefficient 
representation  but  not  in  evaluation  representation. 

2This  machine  was  BlueCrystal  Phase  2;  and  the  authors  would  like  to  thank  the  University  of  Bristol's  Advanced  Computing 
Research  Centre  (https  :  /  / www .  acre  .bris.ac.uk/)  for  access  to  this  facility. 
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Then  in  Section  3  we  describe  our  “general-purpose”  optimizations  on  a  high  level,  with  additional  details 
provided  in  Appendices  A  and  B.  A  brief  overview  of  AES  and  a  high-level  description  and  performance 
numbers  is  provided  in  Section  4. 


2  Background 

2.1  Notations  and  Mathematical  Background 

For  an  integer  q  we  identify  the  ring  Z/gZ  with  the  interval  (—q/ 2,  q/2]  n  Z,  and  use  [z\q  to  denote  the 
reduction  of  the  integer  z  modulo  q  into  that  interval.  Our  implementation  utilizes  polynomial  rings  defined 
by  cyclotomic  polynomials,  A  =  Z[X]/<bm(X).  The  ring  A  is  the  ring  of  integers  of  a  the  mth  cyclotomic 
number  field  Q(£m).  We  let  Aq  =f  A/qA  =  Z[A]/(<hm(A),  q)  for  the  (possibly  composite)  integer  q,  and 
we  identify  Aq  with  the  set  of  integer  polynomials  of  degree  upto  (t>{m)  —  1  reduced  modulo  q. 

Coefficient  vs.  Evaluation  Representation.  Let  m,  q  be  two  integers  such  that  Z/gZ  contains  a  primitive 
m-th  root  of  unity,  and  denote  one  such  primitive  m-th  root  of  unity  by  £  £  Z/gZ.  Recall  that  the  m’th 
cyclotomic  polynomial  splits  into  linear  terms  modulo  q,  (I>m ( A )  =  n te(z/mZ)*  —  0)  (mod  q). 

We  consider  two  ways  of  representing  an  element  a  £  Aq:  Viewing  a  as  a  degree- (0(m)  —  1)  polyno¬ 
mial,  a(X)  =  J2t<(p(m)  a/ff  the  coefficient  representation  of  a  just  lists  all  the  coefficients  in  order  a  = 
(ao,  ai, . . . ,  £Wm)-i)  £  (Z/gZ)'6  For  the  other  representation  we  consider  the  values  that  the  polyno¬ 
mial  a(X)  assumes  on  all  primitive  m-th  roots  of  unity  modulo  q,  bi  =  off  )  mod  q  for  i  £  (Z/mZ)*.  The 
bf  s  in  order  also  yield  a  vector  b  £  (Z/gZ) -'A,  which  we  call  the  evaluation  representation  of  a.  Clearly 
these  two  representations  arc  related  via  b  =  Vm  •  a,  where  Vrn  is  the  Vandermonde  matrix  over  the  primitive 
m-th  roots  of  unity  modulo  q.  We  remark  that  for  all  i  we  have  the  equality  (a  mod  ( X  —  Q1)  )  =  off)  =  /g, 
hence  the  evaluation  representation  of  a  is  just  a  polynomial  Chinese-Remaindering  representation. 

In  both  representations,  an  element  a  £  Aq  is  represented  by  a  <j>(m)-vector  of  integers  in  Z/gZ.  If  q  is 
a  composite  then  each  of  these  integers  can  itself  be  represented  either  using  the  standard  binary  encoding 
of  integers  or  using  Chinese-Remaindering  relative  to  the  factors  of  q.  We  usually  use  the  standard  binary 
encoding  for  the  coefficient  representation  and  Chinese-Remaindering  for  the  evaluation  representation. 
(Hence  the  latter  representation  is  really  a  double  CRT  representation,  relative  to  both  the  polynomial  factors 
of  dbnfV)  and  the  integer  factors  of  q.) 

2.2  BGV-type  Cryptosystems 

Our  implementation  uses  a  valiant  of  the  BGV  cryptosystem  due  to  Gentry,  Halevi  and  Smart,  specifically 
the  one  described  in  [15,  Appendix  D]  (in  the  full  version).  In  this  cryptosystem  both  ciphertexts  and  secret 
keys  arc  vectors  over  the  polynomial  ring  A,  and  the  native  plaintext  space  is  the  space  of  binary  polynomials 
A2.  (More  generally  it  could  be  Ap  for  some  fixed  p  >  2,  but  in  our  case  we  will  always  use  A2.) 

At  any  point  during  the  homomorphic  evaluation  there  is  some  “current  integer  modulus  q”  and  “current 
secret  key  s”,  that  change  from  time  to  time.  A  ciphertext  c  is  decrypted  using  the  current  secret  key  s 
by  taking  inner  product  over  Aq  (with  q  the  current  modulus)  and  then  reducing  the  result  modulo  2  in 
coefficient  representation.  Namely,  the  decryption  formula  is 

a  £-  [  [(c,s)  mod  <3?m(2f)](?  ]2  .  (1) 

' - V - ' 

noise 
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The  polynomial  [(c,  s)  mod  &m(X)]q  is  called  the  “noise”  in  the  ciphertext  c.  Informally,  c  is  a  valid 
ciphertext  with  respect  to  secret  key  s  and  modulus  q  if  this  noise  has  “sufficiently  small  norm”  relative 
to  q.  The  meaning  of  “sufficiently  small  norm”  is  whatever  is  needed  to  ensure  that  the  noise  does  not  wrap 
around  q  when  performing  homomorphic  operations,  in  our  implementation  we  keep  the  norm  of  the  noise 
always  below  some  pre-set  bound  (which  is  determined  in  Appendix  C.2). 

Following  [22,  15],  the  specific  norm  that  we  use  to  evaluate  the  magnitude  of  the  noise  is  the  “canonical 
embedding  norm  reduced  mod  q”,  specifically  we  use  the  conventions  as  described  in  [15,  Appendix  D]  (in 
the  full  version).  This  is  useful  to  get  smaller  parameters,  but  for  the  purpose  of  presentation  the  reader  can 
think  of  the  norm  as  the  Euclidean  norm  of  the  noise  in  coefficient  representation.  More  details  arc  given  in 
the  Appendices.  We  refer  to  the  norm  of  the  noise  as  the  noise  magnitude. 

The  central  feature  of  BGV-type  cryptosystems  is  that  the  current  secret  key  and  modulus  evolve  as 
we  apply  operations  to  ciphertexts.  We  apply  five  different  operations  to  ciphertexts  during  homomorphic 
evaluation.  Three  of  them  —  addition,  multiplication,  and  automorphism  —  are  “semantic  operations”  that 
we  use  to  evolve  the  plaintext  data  which  is  encrypted  under  those  ciphertexts.  The  other  two  operations 
—  key-switching  and  modulus-switching  —  arc  used  for  “maintenance”:  These  operations  do  not  change 
the  plaintext  at  all,  they  only  change  the  current  key  or  modulus  (respectively),  and  they  are  mainly  used 
to  control  the  complexity  of  the  evaluation.  Below  we  briefly  describe  each  of  these  five  operations  on  a 
high  level.  For  the  sake  of  self-containment,  we  also  describe  key  generation  and  encryption  in  Appendix  B. 
More  detailed  description  can  be  found  in  [15,  Appendix  D], 

Addition.  Homomorphic  addition  of  two  ciphertext  vectors  with  respect  to  the  same  secret  key  and  mod¬ 
ulus  q  is  done  just  by  adding  the  vectors  over  Aq.  If  the  two  arguments  were  encrypting  the  plaintext 
polynomials  a\,  02  G  A2  then  the  sum  will  be  an  encryption  of  01  +  02  G  A2.  This  operation  has  no  effect 
on  the  current  modulus  or  key,  and  the  norm  of  the  noise  is  at  most  the  sum  of  norms  from  the  noise  in  the 
two  arguments. 

Multiplication.  Homomorphic  multiplication  is  done  via  tensor  product  over  Aq.  In  principle,  if  the  two 
arguments  have  dimension  n  over  Aq  then  the  product  ciphertext  has  dimension  n2,  each  entry  in  the  output 
computed  as  the  product  of  one  entry  from  the  first  argument  and  one  entry  from  the  second.3 

This  operation  does  not  change  the  current  modulus,  but  it  changes  the  current  key:  If  the  two  input 
ciphertexts  arc  valid  with  respect  to  the  dimension-n  secret  key  vector  s,  encrypting  the  plaintext  polynomi¬ 
als  ai,  02  G  A2,  then  the  output  is  valid  with  respect  to  the  dimension-n2  secret  key  s'  which  is  the  tensor 
product  of  s  with  itself,  and  it  encrypts  the  polynomial  a\  ■  a  9  G  A2.  The  norm  of  the  noise  in  the  product 
ciphertext  can  be  bounded  in  terms  of  the  product  of  norms  of  the  noise  in  the  two  arguments.  For  our  choice 
of  norm  function,  the  norm  of  the  product  is  no  larger  than  the  product  of  the  norms  of  the  two  arguments. 

Key  Switching.  The  public  key  of  BGV-type  cryptosystems  includes  additional  components  to  enable 
converting  a  valid  ciphertext  with  respect  to  one  key  into  a  valid  ciphertext  encrypting  the  same  plaintext 
with  respect  to  another  key.  For  example,  this  is  used  to  convert  the  product  ciphertext  which  is  valid  with 
respect  to  a  high-dimension  key  back  to  a  ciphertext  with  respect  to  the  original  low-dimension  key. 

To  allow  conversion  from  dimension-n'  key  s'  to  dimension-n  key  s  (both  with  respect  to  the  same 
modulus  q),  we  include  in  the  public  key  a  matrix  W  =  W[ s'  — >  s]  over  Aq,  where  the  i’th  column  of  W  is 
roughly  an  encryption  of  the  i’ th  entry  of  s'  with  respect  to  s  (and  the  current  modulus).  Then  given  a  valid 
ciphertext  c'  with  respect  to  s',  we  roughly  compute  c  =  IF  ■  c'  to  get  a  valid  ciphertext  with  respect  to  s. 

fit  was  shown  in  [7]  that  over  polynomial  rings  this  operation  can  be  implemented  while  increasing  the  dimension  only  to  2n  —  1 
rather  than  to  n2 . 
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In  some  more  detail,  the  BGV  key  switching  transformation  first  ensures  that  the  norm  of  the  ciphertext 
c'  itself  is  sufficiently  low  with  respect  to  q.  In  [5]  this  was  done  by  working  with  the  binary  encoding  of 
c',  and  one  of  our  main  optimization  in  this  work  is  a  different  method  for  achieving  the  same  goal  (cf. 
Section  3.1).  Then,  if  the  i’th  entry  in  s'  is  s'  6  A  (with  norm  smaller  than  q ),  then  the  *’ th  column  of 
W[s'  — >•  s]  is  an  /(-vector  w,  such  that  [(w,.  s)  mod  &m(X)\q  =  2er  +  s'  for  a  low-norm  polynomial 
ei  G  A.  Denoting  e  =  (ei, . . . ,  en/),  this  means  that  we  have  s W  =  s'  +  2e  over  Aq.  For  any  ciphertext 
vector  c',  setting  c  =  W  ■  c'  G  Aq  we  get  the  equation 

[<c,  s)  mod  $m(X)]q  =  [sWc'  mod  $m(X)]q  =  [(c',  s')  +  2  (c',  e)  mod  $m(X)]q 

Since  c',  e,  and  [(c',  s')  mod  &m(X)]q  all  have  low  norm  relative  to  g,  then  the  addition  on  the  right-hand 
side  does  not  cause  a  wrap  around  q,  hence  we  get  [[(c,  s)  mod  <3?m(2f)]g]2  =  [[(c',  s')  mod  &.m(X)]q]2,  as 
needed.  The  key-switching  operation  changes  the  current  secret  key  from  s'  to  s,  and  does  not  change  the 
current  modulus.  The  norm  of  the  noise  is  increased  by  at  most  an  additive  factor  of  2\\  (c',  e)  ||. 

Modulus  Switching.  The  modulus  switching  operation  is  intended  to  reduce  the  norm  of  the  noise,  to 
compensate  for  the  noise  increase  that  results  from  all  the  other  operations.  To  convert  a  ciphertext  c  with 
respect  to  secret  key  s  and  modulus  q  into  a  ciphertext  c'  encrypting  the  same  thing  with  respect  to  the  same 
secret  key  but  modulus  q',  we  roughly  just  scale  c  by  a  factor  q' /q  (thus  getting  a  fractional  ciphertext), 
then  round  appropriately  to  get  back  an  integer  ciphertext.  Specifically  c'  is  a  ciphertext  vector  satisfying 
(a)  c'  =  c  (mod  2),  and  (b)  the  “rounding  error  term”  r  c'  —  (g'/q) c  has  low  norm.  Converting  c 
to  c'  is  easy  in  coefficient  representation,  and  one  of  our  optimizations  is  a  method  for  doing  the  same  in 
evaluation  representation  (cf.  Section  3.2)  This  operation  leaves  the  current  key  s  unchanged,  changes  the 
current  modulus  from  q  to  q',  and  the  norm  of  the  noise  is  changed  as  ||n'||  <  (q'/q)||n||  +  ||t  •  s||.  Note  that 
if  the  key  s  has  low  norm  and  q'  is  sufficiently  smaller  than  q,  then  the  noise  magnitude  decreases  by  this 
operation. 

A  BGV-type  cryptosystem  has  a  chain  of  moduli,  go  <  gi  •  •  ■  <  qL-i,  where  fresh  ciphertexts  arc 
with  respect  to  the  largest  modulus  qL-i-  During  homomorphic  evaluation  every  time  the  (estimated)  noise 
grows  too  large  we  apply  modulus  switching  from  qr  to  g,_i  in  order  to  decrease  it  back.  Eventually  we  get 
ciphertexts  with  respect  to  the  smallest  modulus  go,  and  we  cannot  compute  on  them  anymore  (except  by 
using  bootstrapping). 

Automorphisms.  In  addition  to  adding  and  multiplying  polynomials,  another  useful  operation  is  convert¬ 
ing  the  polynomial  a(X)  G  A  to  a(tHX)  =  a(X’)  mod  (l>m{X).  Denoting  by  nt  the  transformation 
Ki  :  a  i — y  a-'-1,  it  is  a  standard  fact  that  the  set  of  transformations  { k,  :  i  G  (Z/mZ)*}  forms  a  group 
under  composition  (which  is  the  Galois  group  £7al(Q(Cm)/Q)),  and  this  group  is  isomorphic  to  (Z/mZ)*. 
In  [5,  15]  it  was  shown  that  applying  the  transformations  nt  to  the  plaintext  polynomials  is  very  useful,  some 
more  examples  of  its  use  can  be  found  in  our  Section  4. 

Denoting  by  c^,  s1'"  the  vector  obtained  by  applying  k,  to  each  entry  in  c,  s,  respectively,  it  was  shown 
in  [5,  15]  that  if  s  is  a  valid  ciphertext  encrypting  a  with  respect  to  key  s  and  modulus  g,  then  c‘A  is  a  valid 
ciphertext  encrypting  a(d  with  respect  to  key  s^)  and  the  same  modulus  q.  Moreover  the  norm  of  noise 
remains  the  same  under  this  operation.  We  remark  that  we  can  apply  key-switching  to  cW  in  order  to  get  an 
encryption  of  with  respect  to  the  original  key  s. 
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2.3  Computing  on  Packed  Ciphertexts 

Smart  and  Vercauteren  observed  [26,  27]  that  the  plaintext  space  A2  can  be  viewed  as  a  vector  of  “plaintext 
slots”,  by  an  application  the  polynomial  Chinese  Remainder  Theorem.  Specifically,  if  the  ring  polynomial 
<bm(A)  factors  modulo  2  into  a  product  of  irreducible  factors  <&m(X)  =  rij=o-C?'PO  (mod  2),  then  a 
plaintext  polynomial  a  (X)  6  A2  can  be  viewed  as  encoding  l  different  small  polynomials,  ti  j  =  a  mod  Fj. 
Just  like  for  integer  Chinese  Remaindering,  addition  and  multiplication  in  A2  correspond  to  element-wise 
addition  and  multiplication  of  the  vectors  of  slots. 

The  effect  of  the  automorphisms  is  a  little  more  involved.  When  i  is  a  power  of  two  then  the  transforma¬ 
tions  Ki  :  a  1 — y  aW  is  just  applied  to  each  slot  separately.  When  i  is  not  a  power  of  two  the  transformation  nt 
has  the  effect  of  roughly  shifting  the  values  between  the  different  slots.  For  example,  for  some  parameters 
we  could  get  a  cyclic  shift  of  the  vector  of  slots:  If  a  encodes  the  vector  (ao,  a±, . . . ,  ay_  1),  then  Ki(a)  (for 
some  i)  could  encode  the  vector  (a^_i,  ao, . . . ,  0^-2).  This  was  used  in  [15]  to  devise  efficient  procedures 
for  applying  arbitrary  permutations  to  the  plaintext  slots. 

We  note  that  the  values  in  the  plaintext  slots  arc  not  just  bits,  rather  they  arc  polynomials  modulo  the 
irreducible  Fj’ s,  so  they  can  be  used  to  represents  elements  in  extension  fields  GF(2d).  In  particular,  in 
some  of  our  AES  implementations  we  used  the  plaintext  slots  to  hold  elements  of  GF(28),  and  encrypt  one 
byte  of  the  AES  state  in  each  slot.  Then  we  can  use  an  adaption  of  the  techniques  from  [15]  to  permute  the 
slots  when  performing  the  AES  row-shift  and  column-mix. 

3  General-Purpose  Optimizations 

Below  we  summarize  our  optimizations  that  arc  not  tied  directly  to  the  AES  circuit  and  can  be  used  also  in 
homomorphic  evaluation  of  other  circuits.  Underlying  many  of  these  optimizations  is  our  choice  of  keeping 
ciphertext  and  key-switching  matrices  in  evaluation  (double-CRT)  representation.  Our  chain  of  moduli  is 
defined  via  a  set  of  primes  of  roughly  the  same  size,  po, . . .  ,pl-  1,  all  chosen  such  that  Z/p*Z  has  a  m’th 
roots  of  unity.  (In  other  words,  m\pi  —  1  for  all  i.)  For  i  =  0. ....  T  —  1  we  then  define  our  z’th  modulus 
as  Qi  =  n;=o  Pi-  The  primes  po  and  pl-i  are  special  (po  is  chosen  to  ensure  decryption  works,  and  pl-i  is 
chosen  to  control  noise  immediately  after  encryption),  however  all  other  primes  pt  arc  of  size  217  <  Pi  <  220 
if  L  <  100,  see  Appendix  C. 

In  the  t-th  level  of  the  scheme  we  have  ciphertexts  consisting  of  elements  in  Aqt  (i.e.,  polynomials 
modulo  (<Fm(X),  qt)).  We  represent  an  element  c  £  Aqt  by  a  (f>{m)  x  (t  +  1)  “matrix”  of  its  evaluations 
at  the  primitive  m-th  roots  of  unity  modulo  the  primes  po, ...  ,pt.  Computing  this  representation  from  the 
coefficient  representation  of  c  involves  reducing  c  modulo  the  pi  s  and  then  t  +  1  invocations  of  the  FFT 
algorithm,  modulo  each  of  the  pi  (picking  only  the  FFT  coefficients  corresponding  to  (Z/mZ)*).  To  convert 
back  to  coefficient  representation  we  invoke  the  inverse  FFT  algorithm  t  +  1  times,  each  time  padding  the 
0(m)-vector  of  evaluation  point  with  mn  —  (p(ra)  zeros  (for  the  evaluations  at  the  non-primitive  roots  of 
unity).  This  yields  the  coefficients  of  t  +  1  polynomials  modulo  (Xrn  —  1 .  p, )  for  i  =  0, ...  ,t,  we  then 
reduce  each  of  these  polynomials  modulo  (<3 ?m(X),pi)  and  apply  Chinese  Remainder  interpolation.  We 
stress  that  we  tty  to  perform  these  transformations  as  rarely  as  we  can. 

3.1  A  New  Variant  of  Key  Switching 

As  described  in  Section  2,  the  key-switching  transformation  introduces  an  additive  factor  of  2  (c/.  e)  in 
the  noise,  where  c'  is  the  input  ciphertext  and  e  is  the  noise  component  in  the  key-switching  matrix.  To 
keep  the  noise  magnitude  below  the  modulus  q,  it  seems  that  we  need  to  ensure  that  the  ciphertext  c' 
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itself  has  low  norm.  In  BGV  [5]  this  was  done  by  representing  c'  as  a  fixed  linear  combination  of  small 
vectors,  i.e.  c'  =  2'c'  with  c'  the  vector  of  i’th  bits  in  c'.  Considering  the  high-dimension  ciphertext 

c*  =  (cq|c',  |c'2  ■  ■  ■)  and  secret  key  s*  =  (s,|2s/|4s,|  •  •  • ),  we  note  that  we  have  (c*,  s*}  =  (c',  s'),  and  c* 
has  low  norm  (since  it  consists  of  0-1  polynomials).  BGV  therefore  included  in  the  public  key  the  matrix 
W  =  W[s*  — »•  s]  (rather  than  W [s'  — )•  s]),  and  had  the  key-switching  transformation  computes  c*  from  c' 
and  sets  c  =  W  ■  c* . 

When  implementing  key-switching,  there  arc  two  drawbacks  to  the  above  approach.  First,  this  increases 
the  dimension  (and  hence  the  size)  of  the  key  switching  matrix.  This  drawback  is  fatal  when  evaluating  deep 
circuits,  since  having  enough  memory  to  keep  the  key-switching  matrices  turns  out  to  be  the  limiting  factor 
in  our  ability  to  evaluate  these  deep  circuits.  In  addition,  for  this  key-switching  we  must  first  convert  c' 
to  coefficient  representation  (in  order  to  compute  the  c'’s),  then  convert  each  of  the  c'’s  back  to  evaluation 
representation  before  multiplying  by  the  key-switching  matrix.  In  level  t  of  the  circuit,  this  seem  to  require 
Q(t  log  (#)  FFTs. 

In  this  work  we  propose  a  different  valiant:  Rather  than  manipulating  c'  to  decrease  its  norm,  we  instead 
temporarily  increase  the  modulus  q.  We  recall  that  for  a  valid  ciphertext  c',  encrypting  plaintext  a  with 
respect  to  s'  and  q,  we  have  the  equality  (c',  s')  =  2e'  +  a  over  Aq,  for  a  low-norm  polynomial  e'.  This 
equality,  we  note,  implies  that  for  every  odd  integer  p  we  have  the  equality  (c',ps')  =  2e"  +  a,  holding 
over  Apq,  for  the  “low-norm”  polynomial  e"  (namely  e"  =  p  ■  e'  +  ^-a).  Clearly,  when  considered  relative 
to  secret  key  ps  and  modulus  pq,  the  noise  in  c'  is  p  times  larger  than  it  was  relative  to  s  and  q.  However, 
since  the  modulus  is  also  p  times  larger,  we  maintain  that  the  noise  has  norm  sufficiently  smaller  than  the 
modulus.  In  other  words,  c'  is  still  a  valid  ciphertext  that  encrypts  the  same  plaintext  a  with  respect  to  secret 
key  ps  and  modulus  pq.  By  taking  p  large  enough,  we  can  ensure  that  the  norm  of  c'  (which  is  independent 
of  p)  is  sufficiently  small  relative  to  the  modulus  pq. 

We  therefore  include  in  the  public  key  a  matrix  W  =  W  [ps'  — >  s]  modulo  pq  for  a  large  enough  odd 
integer  p.  (Specifically  we  need  p  ~  q^/m.)  Given  a  ciphertext  c',  valid  with  respect  to  s  and  q,  we  apply 
the  key-switching  transformation  simply  by  setting  c  =  W  ■  c'  over  A pq.  The  additive  noise  term  (c',  e)  that 
we  get  is  now  small  enough  relative  to  our  large  modulus  pq,  thus  the  resulting  ciphertext  c  is  valid  with 
respect  to  s  and  pq.  We  can  now  switch  the  modulus  back  to  q  (using  our  modulus  switching  routine),  hence 
getting  a  valid  ciphertext  with  respect  to  s  and  q. 

We  note  that  even  though  we  no  longer  break  c'  into  its  binary  encoding,  it  seems  that  we  still  need  to 
recover  it  in  coefficient  representation  in  order  to  compute  the  evaluations  of  c'  mod  p.  However,  since  we 
do  not  increase  the  dimension  of  the  ciphertext  vector,  this  procedure  requires  only  O(t)  FFTs  in  level  t  (vs. 
0{t  log  qt)  =  0(t2)  for  the  original  BGV  valiant).  Also,  the  size  of  the  key-switching  matrix  is  reduced  by 
roughly  the  same  factor  of  log  qt  ■ 

Our  new  variant  comes  with  a  price  tag,  however:  We  use  key-switching  matrices  relative  to  a  larger 
modulus,  but  still  need  the  noise  term  in  this  matrix  to  be  small.  This  means  that  the  LWE  problem  under¬ 
lying  this  key-switching  matrix  has  larger  ratio  of  modulus/noise,  implying  that  we  need  a  larger  dimension 
to  get  the  same  level  of  security  than  with  the  original  BGV  valiant.  In  fact,  since  our  modulus  is  more  than 
squared  (from  q  to  pq  with  p  >  q),  the  dimension  is  increased  by  more  than  a  factor  of  two.  This  translates 
to  more  than  doubling  of  the  key-switching  matrix,  partly  negating  the  size  and  running  time  advantage  that 
we  get  from  this  valiant. 

We  comment  that  a  hybrid  of  the  two  approaches  could  also  be  used:  we  can  decrease  the  norm  of  c' 
only  somewhat  by  breaking  it  into  digits  (as  opposed  to  binary  bits  as  in  [5]),  and  then  increase  the  modulus 
somewhat  until  it  is  large  enough  relative  to  the  smaller  norm  of  c'.  We  speculate  that  the  optimal  setting  in 
terms  of  runtime  is  found  around  p  ~  ^Jq,  but  so  far  did  not  try  to  explore  this  tradeoff. 
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3.2  Modulus  Switching  in  Evaluation  Representation 

Given  an  element  c  <G  Aqt  in  evaluation  (double-CRT)  representation  relative  to  qt  =  Yl^oPj’  we  want 
to  modulus-switch  to  qt- 1  -  i.e.,  scale  down  by  a  factor  of  pt;  we  call  this  operation  Scale(c,  qt,  qt- 1)  The 
output  should  be  d  G  A,  represented  via  the  same  double-CRT  format  (with  respect  to  po, . . .  ,pt~i),  such 
that  (a)  d  =  c  (mod  2),  and  (b)  the  “rounding  error  term”  r  =  d  —  (c/pt)  has  a  very  low  norm.  As  pt  is 
odd,  we  can  equivalently  require  that  the  element  d  =  pt  ■  d  satisfy 

(i)  d  is  divisible  by  pt, 

(ii)  d  =  c  (mod  2),  and 

(iii)  d  —  c  (which  is  equal  to  pt  •  r)  has  low  norm. 

Rather  than  computing  d  directly,  we  will  first  compute  d  and  then  set  d  d  /pt„  Observe  that  once  we 
compute  d  in  double-CRT  format,  it  is  easy  to  output  also  d  in  double-CRT  format:  given  the  evaluations 
for  d  modulo  pj  (j  <  t ),  simply  multiply  them  by  pj 1  mod  pj.  The  algorithm  to  output  d  in  double-CRT 
format  is  as  follows: 

1.  Set  c  to  be  the  coefficient  representation  of  c  mod  pt .  (Computing  this  requires  a  single  “small  FFT” 
modulo  the  prime  pt.) 

2.  Add  or  subtract  pt  from  every  odd  coefficient  of  c,  thus  obtaining  a  polynomial  <5  with  coefficients  in 
(~Pt,Ptj  such  that  5  =  c  =  c  (mod  pt)  and  <5  =  0  (mod  2). 

3.  Set  d  =  c  —  <5,  and  output  it  in  double-CRT  representation. 

Since  we  already  have  c  in  double-CRT  representation,  we  only  need  the  double-CRT  representation 
of  <5,  which  requires  t  more  “small  FFTs”  modulo  the  pf  s. 

As  all  the  coefficients  of  d  are  within  pt  of  those  of  c,  the  “rounding  error  term”  r  =  (d  —  d)  /pt  has 
coefficients  of  magnitude  at  most  one,  hence  it  has  low  norm. 

The  procedure  above  uses  t  +  1  small  FFTs  in  total.  This  should  be  compared  to  the  naive  method  of 
just  converting  everything  to  coefficient  representation  modulo  the  primes  (t  +  1  FFTs),  CRT-interpolating 
the  coefficients,  dividing  and  rounding  appropriately  the  large  integers  (of  size  ~  qt),  CRT-decomposing  the 
coefficients,  and  then  converting  back  to  evaluation  representation  (t  +  1  more  FFTs).  The  above  approach 
makes  explicit  use  of  the  fact  that  we  are  working  in  a  plaintext  space  modulo  2;  in  Appendix  D  we  present 
a  technique  which  works  when  the  plaintext  space  is  defined  modulo  a  larger  modulus. 

3.3  Dynamic  Noise  Management 

As  described  in  the  literature,  BGV-type  cryptosystems  tacitly  assume  that  each  homomorphic  operation 
operation  is  followed  a  modulus  switch  to  reduce  the  noise  magnitude.  In  our  implementation,  however,  we 
attach  to  each  ciphertext  an  estimate  of  the  noise  magnitude  in  that  ciphertext,  and  use  these  estimates  to 
decide  dynamically  when  a  modulus  switch  must  be  performed. 

Each  modulus  switch  consumes  a  level,  and  hence  a  goal  is  to  reduce,  over  a  computation,  the  number  of 
levels  consumed.  By  paying  particular  attention  to  the  parameters  of  the  scheme,  and  by  carefully  analyzing 
how  various  operations  affect  the  noise,  we  arc  able  to  control  the  noise  much  more  carefully  than  in  prior 
work.  In  particular,  we  note  that  modulus-switching  is  really  only  necessary  just  prior  to  multiplication 
(when  the  noise  magnitude  is  about  to  get  squared),  in  other  times  it  is  acceptable  to  keep  the  ciphertexts  at 
a  higher  level  (with  higher  noise). 

8 


5.  Homomorphic  Evaluation  of  the  AES  Circuit 


3.4  Randomized  Multiplication  by  Constants 

Our  implementation  of  the  AES  round  function  uses  just  a  few  multiplication  operations  (only  seven  per 
byte!),  but  it  requires  a  relatively  large  number  of  multiplications  of  encrypted  bytes  by  constants.  Hence  it 
becomes  important  to  try  and  squeeze  down  the  increase  in  noise  when  multiplying  by  a  constant.  To  that 
end,  we  encode  a  constant  polynomial  in  A2  as  a  polynomial  with  coefficients  in  {—1,  0, 1}  rather  than  in 
{0, 1}.  Namely,  we  have  a  procedure  Randomize(a)  that  takes  a  polynomial  a  G  A2  and  replaces  each 
non-zero  coefficients  with  a  coefficients  chosen  uniformly  from  {—1, 1}.  By  Chernoff  bound,  we  expect 
that  for  a  with  h  nonzero  coefficients,  the  canonical  embedding  norm  of  Randomize(a)  to  be  bounded  by 
0{\fh)  with  high  probability  (assuming  that  h  is  large  enough  for  the  bound  to  kick  in).  This  yields  a  better 
bound  on  the  noise  increase  than  the  trivial  bound  of  h  that  we  would  get  if  we  just  multiply  by  a  itself. 
(In  Appendix  A.5  we  present  a  heuristic  argument  that  we  use  to  bound  the  noise,  which  yields  the  same 
asymptotic  bounds  but  slightly  better  constants.) 

4  Homomorphic  Evaluation  of  AES 

Next  we  describe  our  homomorphic  implementation  of  AES- 128.  We  implemented  three  distinct  implemen¬ 
tation  possibilities;  we  first  describe  the  “packed  implementation”,  in  which  the  entire  AES  state  is  packed 
in  just  one  ciphertext.  Two  other  implementations  (of  byte-slice  and  bit-slice  AES)  arc  described  later  in 
Section  4.2.  The  “packed”  implementation  uses  the  least  amount  of  memory  (which  turns  out  to  be  the 
main  constraint  in  our  implementation),  and  also  the  fastest  running  time  for  a  single  evaluation.  The  other 
implementation  choices  allow  more  SIMD  parallelism,  on  the  other  hand,  so  they  can  give  better  amortized 
running  time  when  evaluating  AES  on  many  blocks  in  parallel. 

A  Brief  Overview  of  AES.  The  AES-128  cipher  consists  of  ten  applications  of  the  same  keyed  round 
function  (with  different  round  keys).  The  round  function  operates  on  a  4  x  4  matrix  of  bytes,  which  arc 
sometimes  considered  as  element  of  F2s.  The  basic  operations  that  arc  performed  during  the  round  function 
are  AddKey,  SubBytes,  ShiftRows,  MixColumns.  The  AddKey  is  simply  an  XOR  operation  of  the  current 
state  with  16  bytes  of  key;  the  SubBytes  operation  consists  of  an  inversion  in  the  field  F2s  followed  by 
a  fixed  F2-linear  map  on  the  bits  of  the  element  (relative  to  a  fixed  polynomial  representation  of  F2s);  the 
ShiftRows  rotates  the  entries  in  the  row  i  of  the  4x4  matrix  by  i  —  1  places  to  the  left;  finally  the  MixColumns 
operations  pre-multiplies  the  state  matrix  by  a  fixed  4x4  matrix. 

Our  Packed  Representation  of  the  AES  state.  For  our  implementation  we  chose  the  native  plaintext 
space  of  our  homomorphic  encryption  so  as  to  support  operations  on  the  finite  field  F2s.  To  this  end  we 
choose  our  ring  polynomial  as  4>m(X)  that  factors  modulo  2  into  degree-d  irreducible  polynomials  such 
that  8|d.  (In  other  words,  the  smallest  integer  d  such  that  m|  (2'/  —  1)  is  divisible  by  8.)  This  means  that  our 
plaintext  slots  can  hold  elements  of  F2d,  and  in  particular  we  can  use  them  to  hold  elements  of  F2s  which 
is  a  sub-field  of  F2d.  Since  we  have  i  =  (pirn) /d  plaintext  slots  in  each  ciphertext,  we  can  represent  upto 
|_(/16j  complete  AES  state  matrices  per  ciphertext. 

Moreover,  we  choose  our  parameter  m  so  that  there  exists  an  element  g  £  Z*.,  that  has  order  16  in 
both  7j*m  and  the  quotient  group  Z (2).  This  condition  means  that  if  we  put  16  plaintext  bytes  in  slots 
t.  tg ,  tg1 ,  tg'\  . . .  (for  some  t  £  Z(n),  then  the  conjugation  operation  X  1-*  X9  implements  a  cyclic  right 
shift  over  these  sixteen  plaintext  bytes. 
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In  the  computation  of  the  AES  round  function  we  use  several  constants.  Some  constants  arc  used  in 
the  S-box  lookup  phase  to  implement  the  AES  bit-affine  transformation,  these  arc  denoted  7  and  72,  for 
j  =  0, . . . ,  7.  In  the  row-shift/col-mix  paid  we  use  a  constant  Cs|ct  that  has  1  in  slots  corresponding  to  t  ■  gl 
for  i  =  0, 4, 8, 12,  and  0  in  all  the  other  slots  of  the  form  t  ■  gl.  (Here  slot  t  is  where  we  put  the  first  AES 
byte.)  We  also  use  ’X’  to  denote  the  constant  that  has  the  element  X  in  all  the  slots. 


4.1  Homomorphic  Evaluation  of  the  Basic  Operations 

We  now  examine  each  AES  operation  in  turn,  and  describe  how  it  is  implemented  homomorphic  ally.  For 
each  operation  we  denote  the  plaintext  polynomial  underlying  a  given  input  ciphertext  c  by  a,  and  the 
corresponding  content  of  the  £  plaintext  slots  arc  denoted  as  an  ^-vector  («j)|=1,  with  each  a,  e  F28. 


4.1.1  Ad  d  Key  and  Sub  Bytes 

The  Add  Key  is  just  a  simple  addition  of  ciphertexts,  which  yields  a  4  x  4  matrix  of  bytes  in  the  input  to 
the  SubBytes  operation.  We  place  these  16  bytes  in  plaintext  slots  tgl  for  i  =  0, 1, ... .  15,  using  column¬ 
ordering  to  decide  which  byte  goes  in  what  slot,  namely  we  have 


(1  ~  [ol00Q:10O^20ol30o:01O:llO^21O:31Q:02O^l2O:22<^'32(-)^03<-^13<^'23<->^33]•' 

encrypting  the  input  plaintext  matrix 


A  ~  ( aij)ij  ~ 


CtOO 

«oi 

«  02 

«03 

Clio 

an 

a  12 

a  13 

ct  20 

a  21 

«22 

«23 

«30 

«31 

a  32 

a  33 

During  S-box  lookup,  each  plaintext  byte  a,tj  should  be  replaced  by  =  S(aij),  where  S(-)  is  a  fixed 
permutation  on  the  bytes.  Specifically,  S(x)  is  obtained  by  first  computing  y  =  x~l  in  F2s  (with  0  mapped 
to  0),  then  applying  a  bitwise  affine  transformation  z  =  T(y)  where  elements  in  F28  arc  treated  as  bit  strings 
with  representation  polynomial  G(X )  =  x8  +  x4  +  x3  +  x  +  1. 

We  implement  F28  inversion  followed  by  the  F2  affine  transformation  using  the  Frobenius  automor¬ 
phisms,  X  — >  Xv .  Recall  that  for  a  power  of  two  k  =  2J,  the  transformation  Kk(a(X))  =  (a(Xk)  mod 
4 ?m(X))  is  applied  separately  to  each  slot,  hence  we  can  use  it  to  transform  the  vector  (ai)f=1  into  (cef  )f=1. 
We  note  that  applying  the  Frobenius  automorphisms  to  ciphertexts  has  almost  no  influence  on  the  noise 
magnitude,  and  hence  it  does  not  consume  any  levels.4 

Inversion  over  F28  is  done  using  essentially  the  same  procedure  as  Algorithm  2  from  [25]  for  comput¬ 
ing  j3  =  or1  =  a254.  This  procedure  takes  only  three  Frobenius  automorphisms  and  four  multiplications, 
arranged  in  a  depth-3  circuit  (see  details  below.)  To  apply  the  AES  F2  affine  transformation,  we  use  the  fact 
that  any  F2  affine  transformation  can  be  computed  as  a  F2s  affine  transformation  over  the  conjugates.  Thus 
there  arc  constants  70, 71, ... ,  77,  5  6  F2s  such  that  the  AES  affine  transformation  Iaes(')  can  be  expressed 
as  Taes(/3)  =  S  +  0  7?  '  ^  over  72s.  We  therefore  again  apply  the  Frobenius  automorphisms  to  com¬ 

pute  eight  ciphertexts  encrypting  the  polynomials  Kk{b)  for  k  =  1,  2, 4, ... ,  128,  and  take  the  appropriate 
lineal-  combination  (with  coefficients  the  jj’s)  to  get  an  encryption  of  the  vector  (7aes(XT  1 )  j 2= , .  For  our 
parameters,  a  multiplication-by-constant  operation  consumes  roughly  half  a  level  in  terms  of  added  noise. 


4It  does  increase  the  noise  magnitude  somewhat,  because  we  need  to  do  key  switching  after  these  automorphisms.  But  this  is 
only  a  small  influence,  and  we  will  ignore  it  here. 
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One  subtle  implementation  detail  to  note  here,  is  that  although  our  plaintext  slots  all  hold  elements 
of  the  same  field  F2s,  they  hold  these  elements  with  respect  to  different  polynomial  encodings.  The  AES 
affine  transformation,  on  the  other  hand,  is  defined  with  respect  to  one  particular  fixed  polynomial  encoding. 
This  means  that  we  must  implement  in  the  f  th  slot  not  the  affine  transformation  Taes(')  itself  but  rather 
the  projection  of  this  transformation  onto  the  appropriate  polynomial  encoding:  When  we  take  the  affine 
transformation  of  the  eight  ciphertexts  encrypting  bj  =  k2j  ( b ),  we  therefore  multiply  the  encryption  of  bj 
not  by  a  constant  that  has  y j  in  all  the  slots,  but  rather  by  a  constant  that  has  in  slot  /  the  projection  of  yy  to 
the  polynomial  encoding  of  slot  i. 

Below  we  provide  a  pseudo-code  description  of  our  S-box  lookup  implementation,  together  with  an 
approximation  of  the  levels  that  are  consumed  by  these  operations.  (These  approximations  arc  somewhat 
under-estimates,  however.) 


Input:  ciphertext  c 

Level 

t 

//Compute  C254  =  c”1 

1.  C2  i  c  3>  2 

t 

//  Frobenius  IgI2 

2.  C3GCXC2 

t  +  1 

//  Multiplication 

3.  C12  +-  C3  »  4 

t  +  1 

//  Frobenius  X  1-+  X4 

4.  C14  +-  C12  x  C2 

t  +  2 

//  Multiplication 

5.  C15  x-  C12  X  c3 

t  +  2 

//  Multiplication 

6.  C240  X—  C15  3>  16 

t  +  2 

//  Frobenius  X  h-x  X16 

7.  C254  +-  C240  X  C14 

t  +  3 

//  Multiplication 

//  Affine  transformation  over  F2 

8.  c'2J  <-  C254  >  for  j  =  0,1,2,.. 

. ,  7  t  +  3 

//  Frobenius  X  h-x  X2' 

9.  c"  ^7  +  El=07j  x  c'2j 

t  +  3.5 

//  Linear  combination  over  F28 

4.1.2  ShiftRows  and  MixColumns 

As  commonly  done,  we  interleave  the  ShiftRows/MixColumns  operations,  viewing  both  as  a  single  linear 
transformation  over  vectors  from  (F2s)16.  As  mentioned  above,  by  a  careful  choice  of  the  parameter  m  and 
the  placement  of  the  AES  state  bytes  in  our  plaintext  slots,  we  can  implement  a  rotation-by-i  of  the  rows  of 
the  AES  matrix  as  a  single  automorphism  operations  X  h >  X9'  (for  some  element  g  E  (Z/mZ)*).  Given  the 
ciphertext  c"  after  the  SubBytes  step,  we  use  these  operations  (in  conjunction  with  (-SELECT  operations,  as 
described  in  [15])  to  compute  four  ciphertexts  corresponding  to  the  appropriate  permutations  of  the  16  bytes 
(in  each  of  the  (/ 1 6  different  input  blocks).  These  four  ciphertexts  are  combined  via  a  linear  operation  (with 
coefficients  1,  X,  and  (1  +  X))  to  obtain  the  final  result  of  this  round  function.  Below  is  a  pseudo-code  of 
this  implementation  and  an  approximation  for  the  levels  that  it  consumes  (starting  from  t  —  3.5).  We  note 
that  the  permutations  arc  implemented  using  automorphisms  and  multiplication  by  constant,  thus  we  expect 
them  to  consume  roughly  1/2  level. 


Level 

Input:  ciphertext  c" 

f +  3.5 

10.  c*  7Tj(c")  for  j  =  1,  2,  3, 4 

t  +  4.0 

//  Permutations 

11.  Output  X  •  cj  +  (X  +  1)  • 

t  +  4.5 

//  Linear  combination 
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4.1.3  The  Cost  of  One  Round  Function 


The  above  description  yields  an  estimate  of  5  levels  for  implementing  one  round  function.  This  is  however, 
an  underestimate.  The  actual  number  of  levels  depends  on  details  such  as  how  sparse  the  scalars  are  with 
respect  to  the  embedding  via  4>m  in  a  given  parameter  set,  as  well  as  the  accumulation  of  noise  with  respect 
to  additions,  Frobenius  operations  etc.  Running  over  many  different  parameter  sets  we  find  the  average 
number  of  levels  per  round  for  this  method  varies  between  5.0  and  6.0. 

We  mention  that  the  byte-slice  and  bit-slice  implementations,  given  in  Section  4.2  below,  can  consume 
less  levels  per  round  function,  since  they  do  not  need  to  permute  slots  inside  a  single  ciphertext.  Specifically, 
for  our  byte-sliced  implementation,  we  only  need  4.5-5. 0  levels  per  round  on  average.  However,  since  we 
need  to  manipulate  many  more  ciphertexts,  the  implementation  takes  much  more  time  per  evaluation  and 
requires  much  more  memory.  On  the  other  hand  it  offers  wider  parallelism,  so  yields  better  amortized  time 
per  block.  Our  bit-sliced  implementation  should  theoretical  consume  the  least  number  of  levels  (by  purely 
counting  multiplication  gates),  but  the  noise  introduced  by  additions  means  the  average  number  of  levels 
consumed  per  round  varies  from  5.0  upto  10.0. 

4.2  Byte-  and  Bit-Slice  Implementations 

In  the  byte  sliced  implementation  we  use  sixteen  distinct  ciphertexts  to  represent  a  single  state  matrix.  (But 
since  each  ciphertext  can  hold  £  plaintext  slots,  then  these  16  ciphertexts  can  hold  the  state  of  £  different 
AES  blocks).  In  this  representation  there  is  no  interaction  between  the  slots,  thus  we  operate  with  pure  Afold 
SIMD  operations.  The  AddKey  and  SubBytes  steps  arc  exactly  as  above  (except  applied  to  16  ciphertexts 
rather  than  a  single  one).  The  permutations  in  the  ShiftRows/MixColumns  step  arc  now  “for  free”,  but  the 
scalar  multiplication  in  MixColumns  still  consumes  another  level  in  the  modulus  chain. 

Using  the  same  estimates  as  above,  we  expect  the  number  of  levels  per  round  to  be  roughly  four  (as 
opposed  to  the  4.5  of  the  packed  implementation).  In  practice,  again  over  many  parameter  sets,  we  find  the 
average  number  of  levels  consumed  per  round  is  between  4.5  and  5.0. 

For  the  bit  sliced  implementation  we  represent  the  entire  round  function  as  a  binary  circuit,  and  we  use 
128  distinct  ciphertexts  (one  per  bit  of  the  state  matrix).  However  each  set  of  128  ciphertexts  is  able  to 
represent  a  total  of  £  distinct  blocks.  The  main  issue  here  is  how  to  create  a  circuit  for  the  round  function 
which  is  as  shallow,  in  terms  of  number  of  multiplication  gates,  as  possible.  Again  the  main  issue  is  the 
SubBytes  operation  as  all  operations  arc  essentially  linear.  To  implement  the  SubBytes  we  used  the  “depth- 
16”  circuit  of  Boyar  and  Peralta  [3],  which  consumes  four  levels.  The  rest  of  the  round  function  can  be 
represented  as  a  set  of  bit-additions,  Thus,  implementing  this  method  means  that  we  consumes  a  minimum 
of  four  levels  on  computing  an  entire  round  function.  However,  the  extensive  additions  within  the  Boyar- 
Peralta  circuit  mean  that  we  actually  end  up  consuming  a  lot  more.  On  average  this  translates  into  actually 
consuming  between  5.0  and  10.0  levels  per  round. 

4.3  Performance  Details 

As  remarked  in  the  introduction,  we  implemented  the  above  variant  of  evaluating  AES  homomorphically  on 
a  very  large  memory  machine;  namely  a  machine  with  256  GB  of  RAM.  Firstly  parameters  were  selected, 
as  in  Appendix  C,  to  cope  with  60  levels  of  computation,  and  a  public/private  key  pair  was  generated;  along 
with  the  key-switching  data  for  multiplication  operations  and  conjugation  with-respect-to  the  Galois  group. 

As  input  to  the  actual  computation  was  an  AES  plaintext  block  and  the  eleven  round  keys;  each  of  which 
was  encrypted  using  our  homomorphic  encryption  scheme.  Thus  the  input  consisted  of  eleven  packed 
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ciphertexts.  Producing  the  encrypted  key  schedule  took  around  half  an  hour.  To  evaluate  the  entire  ten 
rounds  of  AES  took  just  over  36  hours;  however  each  of  our  ciphertexts  could  hold  864  plaintext  slots  of 
elements  in  F2s,  thus  we  could  have  processed  54  such  AES  blocks  in  this  time  period.  This  would  result  in 
a  throughput  of  around  forty  minutes  per  AES  block. 

We  note  that  as  the  algorithm  progressed  the  operations  became  faster.  The  first  round  of  the  AES 
function  took  7  hours,  whereas  the  penultimate  round  took  2  hours  and  the  last  round  took  30  minutes. 
Recall,  the  last  AES  round  is  somewhat  simpler  as  it  does  not  involve  a  MixColumns  operation. 

Whilst  our  other  two  implementation  choices  (given  in  Section  4.2  below)  may  seem  to  yield  better 
amortized  per-block  timing,  the  increase  in  memory  requirements  and  data  actually  makes  them  less  attrac¬ 
tive  when  encrypting  a  single  block.  For  example  just  encrypting  the  key  schedule  in  the  Byte-Sliced  valiant 
takes  just  under  5  hours  (with  50  levels),  with  an  entire  encryption  taking  65  hours  (12  hours  for  the  first 
round,  with  between  4  and  5  hours  for  both  the  penultimate  and  final  rounds).  This  however  equates  to  an 
amortized  time  of  just  over  five  minutes  per  block. 

The  Bit-Sliced  valiant  requires  over  150  hours  to  just  encrypt  the  key  schedule  (with  60  levels),  and 
evaluating  a  single  round  takes  so  long  that  our  program  is  timed  out  before  even  a  single  round  is  evaluated. 
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A  More  Details 

Following  [22,  5,  15,  27]  we  utilize  rings  defined  by  cyclotomic  polynomials,  A  =  Z [ V] / <h„,  (X).  We  let 
A q  denote  the  set  of  elements  of  this  ring  reduced  modulo  various  (possibly  composite)  moduli  q.  The  ring 
A  is  the  ring  of  integers  of  a  the  rath  cyclotomic  number  field  K. 

A.l  Plaintext  Slots 

In  our  scheme  plaintexts  will  be  elements  of  A2,  and  the  polynomial  Fm(X)  factors  modulo  2  into  £  ir¬ 
reducible  factors,  <3?m(A)  =  F\(X)  ■  7^(26)  •  •  •  Ff(X)  (mod  2),  all  of  degree  d  =  4>(m)/£.  Just  as  in 
[5,  15,  27]  each  factor  corresponds  to  a  “plaintext  slot”.  That  is,  we  view  a  polynomial  a  £  A2  as  represent¬ 
ing  an  A  vector  (a  mod  F,j)f=l. 

It  is  standard  fact  that  the  Galois  group  Qa\  =  f?al(Q(((m)/(Q>)  consists  of  the  mappings  Kk  ■  a(X)  >-)• 
a(xk)  mod  <l>,n(X)  for  all  k  co-prime  with  m,  and  that  it  is  isomorphic  to  (Z/mZ)*.  As  noted  in  [15],  for 
each  i,  j  £  {1,2, . . .  ,£}  there  is  an  element  Kk  £  Q  a  I  which  sends  an  element  in  slot  %  to  an  element  in  slot 
j.  Namely,  if  b  =  k, ( a )  then  the  element  in  the  j’th  slot  of  b  is  the  same  as  that  in  the  f  th  slot  of  a.  In 
addition  Qa I  contains  the  Frobenius  elements,  X  — >  X2'  ,  which  also  act  as  Frobenius  on  the  individual 
slots  separately. 

For  the  purpose  of  implementing  AES  we  will  be  specifically  interested  in  arithmetic  in  F2s  (represented 
as  F28  =  F2 [X\/G{X)  with  G(X)  =  X8  +  X4  +  X3  +  X  +  1).  We  choose  the  parameters  so  that  d  is 
divisible  by  8,  so  F2<*  includes  F2j  as  a  subfield.  This  lets  us  think  of  the  plaintext  space  as  containing 
A  vectors  over  F2™. 

A.2  Canonical  Embedding  Norm 

Following  [22],  we  use  as  the  “size”  of  a  polynomial  a  £  A  the  l ^  norm  of  its  canonical  embedding.  Recall 
that  the  canonical  embedding  of  a  6  A  into  C^(m)  is  the  4>(m)- vector  of  complex  numbers  o (a)  =  (a(Cm))i 
where  Qn  is  a  complex  primitive  m-th  root  of  unity  and  the  indexes  i  range  over  all  of  (Z/mZ)*.  We  call 
the  norm  of  a  (a)  the  canonical  embedding  norm  of  a,  and  denote  it  by 

IMIS"  =  IkWIIoo- 

We  will  make  use  of  the  following  properties  of  ||  •  j“n: 
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•  For  all  a,  b  6  A  we  have  ||a  •  6||^n  <  ||a||^n  •  ||h||^n. 

•  For  all  a  £  A  we  have  ||a||^n  <  ||a||i- 

•  There  is  a  ring  constant  cm  (depending  only  on  m)  such  that  |ja||oo  £  cm  •  j a \ | “n  for  all  a  6  A. 

The  ring  constant  cm  is  defined  by  cm  =  || CRT”1  Hoc  where  CRTm  is  the  CRT  matrix  for  m,  i.e.  the 
Vandermonde  matrix  over  the  complex  primitive  m-th  roots  of  unity.  Asymptotically  the  value  cm  can  grow 
super-polynomially  with  m,  but  for  the  “small”  values  of  m  one  would  use  in  practice  values  of  cm  can  be 
evaluated  directly.  See  [1 1]  for  a  discussion  of  cm. 

Canonical  Reduction.  When  working  with  elements  in  Aq  for  some  integer  modulus  q,  we  sometimes 
need  a  version  of  the  canonical  embedding  norm  that  plays  nice  with  reduction  modulo  q.  Following  [15], 
we  define  the  canonical  embedding  norm  reduced  modulo  q  of  an  element  a  £  A  as  the  smallest  canonical 
embedding  norm  of  any  o'  which  is  congruent  to  a  modulo  q.  We  denote  it  as 

| cl | qan  =f  min{  ||a'||^n  :  a'  £  A,  o'  =  a  (mod  q)  }. 

We  sometimes  also  denote  the  polynomial  where  the  minimum  is  obtained  by  [a]gan,  and  call  it  the  canonical 
reduction  of  a  modulo  q.  Neither  the  canonical  embedding  norm  nor  the  canonical  reduction  is  used  in  the 
scheme  itself,  it  is  only  in  the  analysis  of  it  that  we  will  need  them.  We  note  that  (trivially)  we  have 


A.3  Double  CRT  Representation 

As  noted  in  Section  2,  we  usually  represent  an  element  a  £  Aq  via  double-CRT  representation,  with  respect 
to  both  the  polynomial  factor  of  <\>/n(X)  and  the  integer  factors  of  q.  Specifically,  we  assume  that  Z/qZ 
contains  a  primitive  m-th  root  of  unity  (call  it  Q,  so  <f>m(AT)  factors  modulo  q  to  linear  terms  ( V )  = 
rLe(z/mZ)*(^  —  CO  (mod  q).  Wc  also  denote  q’s  prime  factorization  by  q  =  n!=0  A-  Then  a  polynomial 
a  G  Aq  is  represented  as  the  (t  +  1 )  x  (b{rn)  matrix  of  its  evaluation  at  the  roots  of  d>m(A')  modulo  pi  for 
i  =  0, . . . ,  t: 

dble-CRT*(a)  =  (a  (C1)  ™dft)0^(j.e(z/mZ), . 

The  double  CRT  representation  can  be  computed  using  t  +  l  invocations  of  the  FFT  algorithm  modulo  the  pt, 
picking  only  the  FFT  coefficients  which  correspond  to  elements  in  (Z/mZ)*.  To  invert  this  representation 
we  invoke  the  inverse  FFT  algorithm  t  +  l  times  on  a  vector  of  length  m  consisting  of  the  thinned  out  values 
padded  with  zeros,  then  apply  the  Chinese  Remainder  Theorem,  and  then  reduce  modulo  <l>m  (X  )  and  q. 

Addition  and  multiplication  in  Aq  can  be  computed  as  component-wise  addition  and  multiplication  of 
the  entries  in  the  two  tables  (modulo  the  appropriate  primes  pi), 

dble-CRT*(o  +  b)  =  dble-CRT*(a)  +  dble-CRTt(6) 
dble-CRTf(a  •  b )  =  dble-CRT*(a)  •  dble-CRT^ft). 

Also,  for  an  element  of  the  Galois  group  kj~  £  (which  maps  a(X)  G  A  to  a( Xk)  mod  <3?m(2f )),  we  can 
evaluate  Kk{a)  on  the  double-CRT  representation  of  a  just  by  permuting  the  columns  in  the  matrix,  sending 
each  column  j  to  column  j  ■  k  mod  rn. 
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A.4  Sampling  From  Aq 

At  various  points  we  will  need  to  sample  from  Aq  with  different  distributions,  as  described  below.  We  denote 
choosing  the  element  a  G  A  according  to  distribution  Vby  a  <—  V.  The  distributions  below  arc  described  as 
over  (j)(m) -vectors,  but  we  always  consider  them  as  distributions  over  the  ring  A,  by  identifying  a  polynomial 
a  €  A  with  its  coefficient  vector. 

The  uniform  distribution  Uq\  This  is  just  the  uniform  distribution  over  (Z/gZ)^"1),  which  we  identify  with 
(Z  n  (— g/ 2,  q/‘2\)-m> ).  Note  that  it  is  easy  to  sample  from  lAq  directly  in  double-CRT  representation. 

The  “discrete  Gaussian”  VQq(a2):  Let  jV(0,  a2)  denote  the  normal  (Gaussian)  distribution  on  real  numbers 
with  zero-mean  and  variance  cr2,  we  use  drawing  from  W(0,  a2  )  and  rounding  to  the  nearest  integer  as 
an  approximation  to  the  discrete  Gaussian  distribution.  Namely,  the  distribution  VQqt(a2)  draws  a  real 
(/i-vector  according  to  jV(0,  a2)'^rnK  rounds  it  to  the  nearest  integer  vector,  and  outputs  that  integer  vector 
reduced  modulo  q  (into  the  interval  (— g/2,  q/ 2]). 

Sampling  small  polynomials,  ZO(p)  and  'HWT(h):  These  distributions  produce  vectors  in  {0,  =tl}^(m). 

For  a  real  parameter  p  <G  [0, 1],  ZO(p)  draws  each  entry  in  the  vector  from  {0,  ±1},  with  probability 
p/2  for  each  of  —1  and  +1,  and  probability  of  being  zero  1  —  p. 

For  an  integer  parameter  h  <  4>(m),  the  distribution  'HWT(h)  chooses  a  vector  uniformly  at  random 
from  {0,  subject  to  the  conditions  that  it  has  exactly  h  nonzero  entries. 

A.5  Canonical  embedding  norm  of  random  polynomials 

In  the  coming  sections  we  will  need  to  bound  the  canonical  embedding  norm  of  polynomials  that  arc  pro¬ 
duced  by  the  distributions  above,  as  well  as  products  of  such  polynomials.  In  some  cases  it  is  possible  to 
analyze  the  norm  rigorously  using  Chernoff  and  Hoeffding  bounds,  but  to  set  the  parameters  of  our  scheme 
we  instead  use  a  heuristic  approach  that  yields  better  constants: 

Let  a  €  A  be  a  polynomial  that  was  chosen  by  one  of  the  distributions  above,  hence  all  the  (nonzero) 
coefficients  in  a  arc  IID  (independently  identically  distributed).  For  a  complex  primitive  m-th  root  of  unity 
Cm-  the  evaluation  a((m)  is  the  inner  product  between  the  coefficient  vector  of  a  and  the  fixed  vector  zm  = 
(1,  Cm i  Cm>  •  •  •)’  which  has  Euclidean  norm  exactly  \/<p(rn).  Hence  the  random  variable  a(Cm)  has  variance 
V  =  cr2(j)(m),  where  cr2  is  the  variance  of  each  coefficient  of  a.  Specifically,  when  a  Uq  then  each 
coefficient  has  variance  q2 / 12,  so  we  get  variance  Vjy  =  g2(/>(m)/12.  When  a  4—  VQq(a2)  we  get  variance 
Vq  ~  cr2(j){m),  and  when  a  <—  ZO(p)  we  get  variance  Vz  =  p4>(m).  When  choosing  a  <—  T~LWT{h)  we 
get  a  variance  of  Vjj  =  h  (but  not  (j>{m),  since  a  has  only  h  nonzero  coefficients). 

Moreover,  the  random  variable  a((m)  is  a  sum  of  many  IID  random  variables,  hence  by  the  law  of  large 
numbers  it  is  distributed  similarly  to  a  complex  Gaussian  random  variable  of  the  specified  variance.5  We 
therefore  use  6W  (i.e.  six  standard  deviations)  as  a  high-probability  bound  on  the  size  of  Since  the 

evaluation  of  a  at  all  the  roots  of  unity  obeys  the  same  bound,  we  use  six  standard  deviations  as  our  bound 
on  the  canonical  embedding  norm  of  a.  (We  chose  six  standard  deviations  since  erfc(6)  ~  2~55,  which  is 
good  enough  for  us  even  when  using  the  union  bound  and  multiplying  it  by  <j)(m)  ~  216.) 

In  many  cases  we  need  to  bound  the  canonical  embedding  norm  of  a  product  of  two  such  “random 
polynomials”.  In  this  case  our  task  is  to  bound  the  magnitude  of  the  product  of  two  random  variables,  both 
are  distributed  close  to  Gaussians,  with  variances  a2,  a2,  respectively.  For  this  case  we  use  1 6rr„/j/;  as  our 

3The  mean  of  a((m)  is  zero,  since  the  coefficients  of  a  are  chosen  from  a  zero-mean  distribution. 
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bound,  since  erfc(4)  ~  2  25 ,  so  the  probability  that  both  variables  exceed  their  standard  deviation  by  more 
than  a  factor  of  four  is  roughly  T  r’°. 

B  The  Basic  Scheme 

We  now  define  our  leveled  HE  scheme  on  L  levels;  including  the  Modulus-Switching  and  Key-Switching 
operations  and  the  procedures  for  KeyGen,  Enc,  Dec,  and  for  Add,  Mult,  Scalar-Mult,  and  Automorphism. 

Recall  that  a  ciphertext  vector  c  in  the  cryptosystem  is  a  valid  encryption  of  a  G  A  with  respect  to 
secret  key  s  and  modulus  q  if  [[(c,  s)]^  =  a,  where  the  inner  product  is  over  A  =  ZW]/<I>m(A),  the 
operation  [-]9  denotes  modular  reduction  in  coefficient  representation  into  the  interval  (—q/2,+q/2\,  and 
we  require  that  the  “noise”  [(c,  s)]q  is  sufficiently  small  (in  canonical  embedding  norm  reduced  mod  q).  In 
our  implementation  a  “normal”  ciphertext  is  a  2-vector  c  =  (co,  ci),  and  a  “normal”  secret  key  is  of  the 
form  s  =  (1,  — s),  hence  decryption  takes  the  form 

a  <—  [cq  —  ci  •  s]g  mod  2.  (2) 


B.l  Our  Moduli  Chain 

We  define  the  chain  of  moduli  for  our  depth-L  homomorphic  evaluation  by  choosing  L  “small  primes” 
Po, pi, . . .  ,pL- 1  and  the  t’th  modulus  in  our  chain  is  defined  as  qt  =  n'=0/A-  (The  sizes  will  be  determined 
later.)  The  primes  pi  s  arc  chosen  so  that  for  all  i,  Z/p.Z  contains  a  primitive  m-th  root  of  unity.  Hence  we 
can  use  our  double-CRT  representation  for  all  Aqt. 

This  choice  of  moduli  makes  it  easy  to  get  a  level- (/  —  1)  representation  of  a  6  A  from  its  level-/:  repre¬ 
sentation.  Specifically,  given  the  level-/  double-CRT  representation  dble-CRT^a)  for  some  a  6  Aqt,  we  can 
simply  remove  from  the  matrix  the  row  corresponding  to  the  last  small  prime  pt,  thus  obtaining  a  level-(t—  1) 
representation  of  a  mod  qt- 1  €  Af/,_ , .  Similarly  we  can  get  the  double-CRT  representation  for  lower  levels 
by  removing  more  rows.  By  a  slight  abuse  of  notation  we  write  dble-CRT4  (a)  =  dhle-CRT7  (a)  mod  qp 
for  t!  <  /. 

Recall  that  encryption  produces  ciphertext  vectors  valid  with  respect  to  the  largest  modulus  q^-i  in  our 
chain,  and  we  obtain  ciphertext  vectors  valid  with  respect  to  smaller  moduli  whenever  we  apply  modulus¬ 
switching  to  decrease  the  noise  magnitude.  As  described  in  Section  3.3,  our  implementation  dynamically 
adjust  levels,  performing  modulus  switching  when  the  dynamically-computed  noise  estimate  becomes  too 
large.  Hence  each  ciphertext  in  our  scheme  is  tagged  with  both  its  level  t  (pinpointing  the  modulus  qt  relative 
to  which  this  ciphertext  is  valid),  and  an  estimate  u  on  the  noise  magnitude  in  this  ciphertext.  In  other  words, 
a  ciphertext  is  a  triple  (c,  /,  v)  with  0</<L  —  l,ca  vector  over  Aqt,  and  u  a  real  number  which  is  used 
as  our  noise  estimate. 

B.2  Modulus  Switching 

The  operation  SwitchModulus(c)  takes  the  ciphertext  c  =  ((co,  ci),  /,  u)  defined  modulo  qt  and  produces  a 
ciphertext  c'  =  ((c'0,  Cj_),  /  —  1,  v')  defined  modulo  qt-i,  Such  that  [co  —  s  •  c\]qt  =  [c'0  —  s  •  cj ]qt_1  (mod  2), 
and  z/  is  smaller  than  v.  This  procedure  makes  use  of  the  function  Seal e(x,q,q')  that  takes  an  element 
x  G  Aq  and  returns  an  element  y  6  Aq>  such  that  in  coefficient  representation  it  holds  that  y  =  x  (mod  2), 
and  y  is  the  closest  element  to  (q'/q)  ■  x  that  satisfies  this  mod-2  condition. 
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To  maintain  the  noise  estimate,  the  procedure  uses  the  pre-set  ring-constant  cm  (cf.  Appendix  A.2)  and 
also  a  pre-set  constant  Bsca\e  which  is  meant  to  bound  the  magnitude  of  the  added  noise  term  from  this 
operation.  It  works  as  follows: 

SwitchModulus((co,  c\),t,  u)\ 

1.  If  t  <  1  then  abort; 

2-  v'  l—  c^~  •  v  +  Bsca |e; 

3.  I  f  u'  >  qt~i/2cm  then  abort; 

4.  c'  Scale(cj,  qt,  qt- 1)  for  i  =  0, 1; 

5.  Output  ((c'0,  c[),t  —  1,  v'). 

The  constant  Bsca\e  is  set  as  Bsca\e  =  2^/cj)(m)/3  ■  ( 8  \J h  +  3),  where  h  is  the  Hamming  weight  of  the 
secret  key.  (In  our  implementation  we  use  h  =  64,  so  we  get  Hsca|e  ~  77^/cf>(m).)  To  justify  this  choice,  we 
apply  to  the  proof  of  the  modulus  switching  lemma  from  [15,  Lemma  13]  (in  the  full  version),  relative  to  the 
canonical  embedding  norm.  In  that  proof  it  is  shown  that  when  the  noise  magnitude  in  the  input  ciphertext 
c  =  (co,  ci)  is  bounded  by  u,  then  the  noise  magnitude  in  the  output  vector  c'  =  (c'0,  cj)  is  bounded  by 
u'  =  •  v  +  ||  (s,  r)  ||^n,  provided  that  the  last  quantity  is  smaller  than  qt- 1/2. 

Above  t  is  the  “rounding  error”  vector,  namely  r  =f  (ro,n)  =  (cq,  c\)  —  ^T(co,ci).  Heuristically 
assuming  that  r  behaves  as  if  its  coefficients  are  chosen  uniformly  in  [—1,  +1],  the  evaluation  t,(Q  at  an 
m-th  root  of  unity  Qm  is  distributed  close  to  a  Gaussian  complex  with  variance  o(rn)/3.  Also,  s  was  drawn 
from  7~LWT{h)  so  5((m)  is  distributed  close  to  a  Gaussian  complex  with  variance  h.  Hence  we  expect 
ri(C)s(C)  t°  haye  magnitude  at  most  1 6  y/cp(rn)/3  ■  h  (recall  that  we  use  h  =  64).  We  can  similarly  bound 
t o (Cm)  by  6\/(p(rn)/3,  and  therefore  the  evaluation  of  (s,  r)  at  Qrn  is  bounded  in  magnitude  (whp)  by: 

16y/0(m)/3  •  h  +6\/^(m)/3  =  2^/4>{m)/3  ■  (8 Vh  +  3)  «  77 4>{m)  =  Bsca !e  (3) 

B.3  Key  Switching 

After  some  homomorphic  evaluation  operations  we  have  on  our  hands  not  a  “normal”  ciphertext  which  is 
valid  relative  to  “normal”  secret  key,  but  rather  an  “extended  ciphertext”  ((do,  di,  (h).  qt,  v)  which  is  valid 
with  respect  to  an  “extended  secret  key”  s'  =  (1,  — s,  —s').  Namely,  this  ciphertext  encrypts  the  plaintext 
a  €  A  via 

a  =  [[d0  —s  •  di  —  s'  ■  d2]qt]2 

and  the  magnitude  of  the  noise  [do  —  s  •  d\  —  d2  ■  s']  is  bounded  by  u.  In  our  implementation,  the  component 
s  is  always  the  same  element  s  G  A  that  was  drawn  from  'HWTih)  during  key  generation,  but  s'  can  vary 
depending  on  the  operation.  (See  the  description  of  multiplication  and  automorphisms  below.) 

To  enable  that  translation,  we  use  some  “key  switching  matrices”  that  arc  included  in  the  public  key.  (In 
our  implementation  these  “matrices”  have  dimension  2x1,  i.e.,  the  consist  of  only  two  elements  from  A.) 
As  explained  in  Section  3.1,  we  save  on  space  and  time  by  artificially  “boosting”  the  modulus  we  use  from 
qt  up  to  P  ■  qt  for  some  “large”  modulus  P.  We  note  that  in  order  to  represent  elements  in  A pqt  using  our 
dble-CRT  representation  we  need  to  choose  P  so  that  TLjPTL  also  has  primitive  m-th  roots  of  unity.  (In  fact 
in  our  implementation  we  pick  P  to  be  a  prime.) 


//  Sanity  check 

//  Scale  down  the  noise  estimate 
//  Another  sanity  check 
//  Scale  down  the  vector 
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The  key-switching  “matrix”.  Denote  by  Q  =  P  ■  qL-2  the  largest  modulus  relative  to  which  we  need 
to  generate  key-switching  matrices.  To  generate  the  key-switching  matrix  from  s'  =  (1,  — s,  —s')  to  s  = 
(1,  —  s)  (note  that  both  keys  share  the  same  element  s),  we  choose  two  element,  one  uniform  and  the  other 
from  our  “discrete  Gaussian”, 

aSt S'  ^-Uq  and  eSjS>  ^VQQ(a2), 

where  the  variance  a  is  a  global  parameter  (that  we  later  set  as  a  =  3.2).  The  “key  switching  matrix”  then 
consists  of  the  single  column  vector 

W[s'  -A  s]  =  (  bg’s'  ^  ,  where  bStS>  d=  [s  •  aSjS>  +  2esy  +  Ps']Q.  (4) 

V  as,s'  /  v 

Note  that  W  above  is  defined  modulo  Q  =  PqL_2,  but  we  need  to  use  it  relative  to  Qt  =  Pqt  for  whatever 
the  cuirent  level  t  is.  Hence  before  applying  the  key  switching  procedure  at  level  t,  we  reduce  W  modulo  Qt 

def 

to  get  Wt  =  IV]  Qt .  It  is  important  to  note  that  since  Qt  divides  Q  then  Wt  is  indeed  a  key-switching  matrix. 
Namely  it  is  of  the  form  (6,  a)T  with  a  £  U.Qt  and  b  =  [s  •  a  +  2 eStS>  +  Ps']Qt  (with  respect  to  the  same 
element  e5  S>  £  A  from  above). 


The  Switch  Key  procedure.  Given  the  extended  ciphertext  c  =  ((do,  di,d,2),t,  u)  and  the  key-switching 
matrix  Wt  =  ( b ,  a)T,  the  procedure  Switch Keyll/(  (c)  proceeds  as  follows:6 


Switch  Key(6j0)(  (d0,  d1,d2),t,  v): 


1. 

2. 

3. 

4. 


Pd0  b  \(  1 
Pd\  a  J  \  d2 


c''  Scale(c',  Qt,  qt)  for  i  =  0, 1; 
v'  <5—  v  +  d?Ks  •  Qt/ P  +  d?Scale, 
Output  ((c^c'/),*.  A). 


//  The  actual  key-switching  operation 

//  Scale  the  vector  back  down  to  qt 
II  The  constant  /As  is  determined  below 


To  argue  correctness,  observe  that  although  the  “actual  key  switching  operation”  from  above  looks 
superficially  different  from  the  standard  key-switching  operation  c'  -s—  W  ■  c,  it  is  merely  an  optimization 
that  takes  advantage  of  the  fact  that  both  vectors  s'  and  s  share  the  element  s.  Indeed,  we  have  the  equality 
over  AQt: 

c'0  -  s  •  ci  =  [(P  ■  d0)  +  d2  ■  bsj  -  5  ■  ((P  •  di)  +  d2  ■  a5.5,) 

=  P  ■  (d0  -  s  •  di  -  s'd2)  +  2  •  d2  ■  eStS>, 

so  as  long  as  both  sides  arc  smaller  than  Qt  we  have  the  same  equality  also  over  A  (without  the  mod -Qt 
reduction),  which  means  that  we  get 

[c'o  -  S  •  (/l}Qt  =  [P  ■  (do  -  s  •  di  -  s'd2)  +  2  •  d2  •  es^]Qt  =  [d0  -  s  ■  di  -  s'd2]Qt  (mod  2). 

To  analyze  the  size  of  the  added  term  2 d2esy,  we  can  assume  heuristically  that  d2  behaves  like  a  uniform 
polynomial  drawn  from  Uqt,  hence  d2(Qn)  for  a  complex  root  of  unity  (rn  is  distributed  close  to  a  complex 
Gaussian  with  variance  qf([){rn)/  \2.  Similarly  es^((m)  is  distributed  close  to  a  complex  Gaussian  with 

6For  simplicity  we  describe  the  Switch  Key  procedure  as  if  it  always  switches  back  to  mod-qt,  but  in  reality  if  the  noise  estimate 
is  large  enough  then  it  can  switch  directly  to  qt- 1  instead. 
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variance  a2(t>(m),  so  2o?2 (C)e(C)  can  be  modeled  as  a  product  of  two  Gaussians,  and  we  expect  that  with 
overwhelming  probability  it  remains  smaller  than  2  •  16  •  y/ q2 12  •  a2cj)(m)  =  -^|  •  aqt4>(m).  This 

yields  a  heuristic  bound  16 / a/3  •  a(p(rn)  ■  qt  =  B\y s  ■  qt  on  the  canonical  embedding  norm  of  the  added 
noise  term,  and  if  the  total  noise  magnitude  does  not  exceed  Qt/2cm  then  also  in  coefficient  representation 
everything  remains  below  Qt/ 2.  Thus  our  constant  T>ks  is  set  as 

16  a  (Mjn)  _  =  (5) 

Finally,  dividing  by  P  (which  is  the  effect  of  the  Scale  operation),  we  obtain  the  final  ciphertext  that  we 
require,  and  the  noise  magnitude  is  divided  by  P  (except  for  the  added  Bsca\e  term). 

B.4  Key- Generation,  Encryption,  and  Decryption 

The  procedures  below  depend  on  many  parameters,  h,  a,  m,  the  primes  p,  and  P,  etc.  These  parameters  will 
be  determined  later. 

KeyGen():  Given  the  parameters,  the  key  generation  procedure  chooses  a  low-weight  secret  key  and  then 
generates  an  LWE  instance  relative  to  that  secret  key.  Namely,  we  choose 

s  i —  B'W'P(lz) ,  o,  i —  BqL_1 ,  and  e  i —  T)Q qL_1  {a 

Then  sets  the  secret  key  as  s  and  the  public  key  as  (a,  b)  where  b  =  [a  ■  s  +  2e\qL_1 . 

In  addition,  the  key  generation  procedure  adds  to  the  public  key  some  key-switching  “matrices”,  as 
described  in  Appendix  B.3.  Specifically  the  matrix  W[s2  s]  for  use  in  multiplication,  and  some  matrices 
W[ni(s)  —t  s]  for  use  in  automorphisms,  for  Kt  G  Sal  whose  indexes  generates  (Z/mZ)*  (including  in 
particular  up)- 

EnCpf  (m):  To  encrypt  an  element  m  G  A2,  we  choose  one  “small  polynomial”  (with  0,  ±1  coefficients)  and 
two  Gaussian  polynomials  (with  variance  o2), 

dg20( 0.5)  and  eo,  ei  <—  VQqL1  (a2) 

Then  we  set  co  =  b-v+2-eo+m,  ci  =  a-n+2-ei,  and  set  the  initial  ciphertext  as  c'  =  (co,  ci,  L  —  1,  f?c|ean), 
where  /iciean  is  a  pai’ameter  that  we  determine  below. 

The  noise  magnitude  in  this  ciphertext  ( /iciean)  is  a  little  larger  than  what  we  would  like,  so  before  we 
start  computing  on  it  we  do  one  modulus-switch.  That  is,  the  encryption  procedure  sets  c  <—  SwitchModulusic'  ) 
and  outputs  c.  We  can  deduce  a  value  for  Bc\ean  as  follows: 

I  I  can  ,  11  1 1  c3 n 

| Co  ^  ‘  1 1 A  1 1 Co  S  •  CiHoq 

=  ||((a  •  s  +  2  •  e)  •  v  +  2  •  e0  +  m  -  (a  •  v  +  2  •  ei)  •  s||^n 
=  ||m  +  2  •  (e  •  v  +  e0  -  ei  •  s)||^n 
<  ||m||«"  +  2  •  (||e  •  V||«"  +  ||e0||Sn  +  ||ei  •  S||Sn) 

Using  our  complex  Gaussian  heuristic  from  Appendix  A.5,  we  can  bound  the  canonical  embedding  norm  of 
the  randomized  terms  above  by 

|| e  •  c||^n  <  16<r<f>(m)/V2,  || e0 ||^n  <  6 ||ei-s||^,n  <  16a y/h  ■  4>(m) 
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Also,  the  norm  of  the  input  message  m  is  clearly  bounded  by  (p(m),  hence  (when  we  substitute  our  param¬ 
eters  h  =  64  and  cr  =  3.2)  we  get  the  bound 

4>(m)  +  32<70(m)/v/2  +  12cry +  32a \Jh  ■  <. j>(m)  ~  74 cf>(m)  +  858y/ 4>(m)  =  Bc\eari  (6) 

Our  goal  in  the  initial  modulus  switching  from  ql-i  to  qL-2  is  to  reduce  the  noise  from  its  initial  level  of 
-Bciean  =  0(</>(m))  to  our  base-line  bound  of  B  =  ©(a J<t>(jn))  which  is  determined  in  Equation  (12)  below. 

Decpt(c):  Decryption  of  a  ciphertext  (co,  c\ ,  t.  v)  at  level  t  is  performed  by  setting  m!  <—  [co  —  s  •  c\]qt, 
then  converting  m'  to  coefficient  representation  and  outputting  m!  mod  2.  This  procedure  works  when 
Cm  •  v  <  qt/2,  so  this  procedure  only  applies  when  the  constant  cm  for  the  field  A  is  known  and  relatively 
small  (which  as  we  mentioned  above  will  be  true  for  all  practical  parameters).  Also,  we  must  pick  the 
smallest  prime  qo  =  po  large  enough,  as  described  in  Appendix  C.2. 

B.5  Homomorphic  Operations 

Add(c,  d):  Given  two  ciphertexts  c  =  ((co,  c.\ ).  t,  u)  and  c'  =  ((cq,  c\  ),t' .  id),  representing  messages 
m,  nT  G  A2,  this  algorithm  forms  a  ciphertext  ca  =  ((ao,  a  1 ) ,  ta ,  va)  which  encrypts  the  message  ma  = 

m  +  in' . 

If  the  two  ciphertexts  do  not  belong  to  the  same  level  then  we  reduce  the  larger  one  modulo  the  smaller 
of  the  two  moduli,  thus  bringing  them  to  the  same  level.  (This  simple  modular  reduction  works  as  long  as 
the  noise  magnitude  is  smaller  than  the  smaller  of  the  two  moduli,  if  this  condition  does  not  hold  then  we 
need  to  do  modulus  switching  rather  than  simple  modular  reduction.)  Once  the  two  ciphertexts  arc  at  the 
same  level  (call  it  t"),  we  just  add  the  two  ciphertext  vectors  and  two  noise  estimates  to  get 

G  =  (([co  +  c'o[qtni  [C1  +  cl]qtn)  i  ^  v  +  u')  • 

Mult(c,c'):  Given  two  ciphertexts  representing  messages  m,  m'  £  A2,  this  algorithm  forms  a  ciphertext 
encrypts  the  message  m  •  m'. 

We  begin  by  ensuring  that  the  noise  magnitude  in  both  ciphertexts  is  smaller  than  the  pre-set  constant 
B  (which  is  our  base-line  bound  and  is  determined  inEquation  (12)  below),  performing  modulus-switching 
as  needed  to  ensure  this  condition.  Then  we  bring  both  ciphertexts  to  the  same  level  by  reducing  modulo 
the  smaller  of  the  two  moduli  (if  needed).  Once  both  ciphertexts  have  small  noise  magnitude  and  the  same 
level  we  form  the  extended  ciphertext  (essentially  performing  the  tensor  product  of  the  two)  and  apply 
key-switching  to  get  back  a  normal  ciphertext.  A  pseudo-code  description  of  this  procedure  is  given  below. 

Mult(c,  c'): 

1.  While  v{ c)  >  B  do  c  <—  SwitchModulus(c);  //  u(c)  is  the  noise  estimate  in  c 

2.  While  v{d)  >  B  do  c'  <—  SwitchModulus(c/);  //  v{d)  is  the  noise  estimate  in  d 

3.  Bring  c,  c'  to  the  same  level  t  by  reducing  modulo  the  smaller  of  the  two  moduli 

Denote  after  modular  reduction  c  =  ((co,  c\ )■  t,  v)  and  d  =  ((cq,  cj ),  t,  u') 

4.  Set  (d0,  di,d2)  £-  (c0  •  c'0  ,  ci  •  c'0  +  c0  •  cj  ,  -  ci  •  c'J; 

Denote  c"  =  ((do,  d\,d2),t,  v  ■  id) 

5.  Output  Switch Keyly[s2_^s](c,/)  //  Convert  to  “normal”  ciphertext 
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We  stress  that  the  only  place  where  we  force  modulus  switching  is  before  the  multiplication  operation. 
In  all  other  operations  we  allow  the  noise  to  grow,  and  it  will  be  reduced  back  the  first  time  it  is  input  to  a 
multiplication  operation.  We  also  note  that  we  may  need  to  apply  modulus  switching  more  than  once  before 
the  noise  is  small  enough. 

Scalar-Mult(c,  a)\  Given  a  ciphertext  c  =  (co,  c  \ ,  t.  v)  representing  the  message  m,  and  an  element  a  €  A2 
(represented  as  a  polynomial  modulo  2  with  coefficients  in  {  —  1,  0, 1}),  this  algorithm  forms  a  ciphertext 
Cm  =  (on,  a  \ ,  t-m  -  I'm)  which  encrypts  the  message  mm  =  a  •  in.  This  procedure  is  needed  in  our  imple¬ 
mentation  of  homomorphic  AES,  and  is  of  more  general  interest  in  general  computation  over  finite  fields. 

The  algorithm  makes  use  of  a  procedure  Randomize(a)  which  takes  a  and  replaces  each  non-zero  co¬ 
efficients  with  a  coefficients  chosen  at  random  from  {  —  1, 1}.  To  multiply  by  a ,  we  set  (3  <—  Randomize(a) 
and  then  just  multiply  both  co  and  ci  by  3.  Using  the  same  argument  as  we  used  in  Appendix  A.5  for  the 
distribution  7iWT(h),  here  too  we  can  bound  the  norm  of  3  by  ||/3||^n  <  6i|/Wt(a)  where  Wt(a)  is  the 
number  of  nonzero  coefficients  of  a.  Hence  we  multiply  the  noise  estimate  by  6y/Wt (a),  and  output  the 
resulting  ciphertext  cm  =  (co  ■  (3,  c\  ■  (3,  t,  v  ■  6 y/Wt(a)). 

Automorphism(c,  k):  In  the  main  body  we  explained  how  permutations  on  the  plaintext  slots  can  be  real¬ 
ized  via  using  elements  k  £  QaY,  we  also  require  the  application  of  such  automorphism  to  implement  the 
Frobenius  maps  in  our  AES  implementation. 

For  each  k  that  we  want  to  use,  we  need  to  include  in  the  public  key  the  “matrix”  W[k(s)  —t  s].  Then, 
given  a  ciphertext  c  =  (co,  c\,t,  v)  representing  the  message  m,  the  function  Automorphism(c,  k)  produces 
a  ciphertext  c'  =  (c0,  c\ ,  t.  v’)  which  represents  the  message  re(m).  We  first  set  an  “extended  ciphertext”  by 
setting 

do  =  k(co),  d\  0,  and  d-2  n(c\) 

and  then  apply  key  switching  to  the  extended  ciphertext  ((do,  d\ .  d^),  t,  v )  using  the  “matrix”  W[n(s)  —>  s], 

C  Security  Analysis  and  Parameter  Settings 

Below  we  derive  the  concrete  parameters  for  use  in  our  implementation.  We  begin  in  Appendix  C.l  by 
deriving  a  lower-bound  on  the  dimension  N  of  the  LWE  problem  underlying  our  key-switching  matrices, 
as  a  function  of  the  modulus  and  the  noise  variance.  (This  will  serve  as  a  lower-bound  on  3{m)  for  our 
choice  of  the  ring  polynomial  T*m(A).)  Then  in  Appendix  C.2  we  derive  a  lower  bound  on  the  size  of 
the  largest  modulus  Q  in  our  implementation,  in  terms  of  the  noise  variance  and  the  dimension  N.  Then 
in  Appendix  C.3  we  choose  a  value  for  the  noise  variance  (as  small  as  possible  subject  to  some  nominal 
security  concerns),  solve  the  somewhat  circular  constraints  on  N  and  Q,  and  set  all  the  other  parameters. 

C.l  Lower-Bounding  the  Dimension 

Below  we  apply  to  the  LWE-security  analysis  of  Lindner  and  Peikert  [20],  together  with  a  few  (arguably 
justifiable)  assumptions,  to  analyze  the  dimension  needed  for  different  security  levels.  The  analysis  below 
assumes  that  we  are  given  the  modulus  Q  and  noise  variance  a2  for  the  LWE  problem  (i.e.,  the  noise  is 
chosen  from  a  discrete  Gaussian  distribution  modulo  Q  with  variance  u2  in  each  coordinate).  The  goal  is  to 
derive  a  lower-bound  on  the  dimension  N  required  to  get  any  given  security  level.  The  first  assumption  that 
we  make,  of  course,  is  that  the  Lindner-Peikert  analysis  —  which  was  done  in  the  context  of  standard  LWE 
—  applies  also  for  our  ring-LWE  case.  We  also  make  the  following  extra  assumptions: 
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•  We  assume  that  (once  a  is  not  too  tiny),  the  security  depends  on  the  ratio  Q/a  and  not  on  Q  and  a 
separately.  Nearly  all  the  attacks  and  hardness  results  in  the  literature  support  this  assumption,  with 
the  exception  of  the  Arora-Gc  attack  [2]  (that  works  whenever  a  is  very  small,  regardless  of  Q). 

•  The  analysis  in  [20]  devised  an  experimental  formula  for  the  time  that  it  takes  to  get  a  particular  quality 
of  reduced  basis  (i.e.,  the  parameter  5  of  Gama  and  Nguyen  [12]),  then  provided  another  formula  for 
the  advantage  that  the  attack  can  derive  from  a  reduced  basis  at  a  given  quality,  and  finally  used  a 
computer  program  to  solve  these  formulas  for  some  given  values  of  N  and  5.  This  provides  some 
time/advantage  tradeoff,  since  obtaining  a  smaller  value  of  S  (i.e.,  higher-quality  basis)  takes  longer 
time  and  provides  better  advantage  for  the  attacker. 

For  our  purposes  we  made  the  assumption  that  the  best  runtime/advantage  ratio  is  achieved  in  the 
high-advantage  regime.  Namely  we  should  spend  basically  all  the  attack  running  time  doing  lattice 
reduction,  in  order  to  get  a  good  enough  basis  that  will  break  security  with  advantage  (say)  1/2.  This 
assumption  is  consistent  with  the  results  that  are  reported  in  [20]. 

•  Finally,  we  assume  that  to  get  advantage  of  close  to  1/2  for  an  LWE  instance  with  modulus  Q  and 
noise  a,  we  need  to  be  able  to  reduce  the  basis  well  enough  until  the  shortest  vector  is  of  size  roughly 
Q/a.  Again,  this  is  consistent  with  the  results  that  are  reported  in  [20]. 

Given  these  assumptions  and  the  formulas  from  [20],  we  can  now  solve  the  dimension/security  tradeoff 
analytically.  Because  of  the  first  assumption  we  might  as  well  simplify  the  equations  and  derive  our  lower 
bound  on  N  for  the  case  a  =  1,  where  the  ratio  Q/a  is  equal  to  Q.  (In  reality  we  will  use  a  ~  4  and 
increase  the  modulus  by  the  same  2  bits). 

Following  Gama-Nguyen  [12],  recall  that  a  reduced  basis  B  =  (6i 1 62 1  •  •  •  | bm)  for  a  dimension-M, 
determinant-/.)  lattice  (with  ||6i||  <  || 62 1|  <  ■  ■  ■  ||£>m  ||)>  has  quality  parameter  5  if  the  shortest  vector  in  that 
basis  has  norm  ||6i||  =  SM  ■  D 1  !M .  In  other  words,  the  quality  of  B  is  defined  as  5  =  ||61||1/M/Z)1/M2. 
The  time  (in  seconds)  that  it  takes  to  compute  a  reduced  basis  of  quality  S  for  a  random  LWE  instance  was 
estimated  in  [20]  to  be  at  least 

log(time)  >  1.8/  log(A)  —  110.  (7) 

For  a  random  Q-ary  lattice  of  rank  N,  the  determinant  is  exactly  QN  whp,  and  therefore  a  quality-)  basis  has 
1 1 b±  ||  =  SM  ■  QN/M  .  By  our  second  assumption,  we  should  reduce  the  basis  enough  so  that  ||6i  ||  =  Q,  so  we 
need  Q  =  5M  ■  QN'M .  The  LWE  attacker  gets  to  choose  the  dimension  M,  and  the  best  choice  for  this  attack 
is  obtained  when  the  right-hand-side  of  the  last  equality  is  minimized,  namely  for  M  =  ^ N  log  Q /  log  S. 
This  yields  the  condition 

log  <5  =  log (5mQn/m)  =  M  log  5  +  (. N/M )  log  Q  =  2 log  Q  log  <5, 

which  we  can  solve  for  N  to  get  N  =  log  Q /4  log  <5.  Finally,  we  can  use  Equation  (7)  to  express  log  5  as  a 
function  of  log(time),  thus  getting  N  =  log  Q  ■  (log(time)  +  110) /7.2.  Recalling  that  in  our  case  we  used 
<7  =  1  (so  Q/a  —  Q),  we  get  our  lower-bound  on  N  in  terms  of  Q/a.  Namely,  to  ensure  a  time/advantage 
ratio  of  at  least  2k,  we  need  to  set  the  rank  N  to  be  at  least 

N  >  log{Q/cr){k  +  110) 

7 . 2 

For  example,  the  above  formula  says  that  to  get  80-bit  security  level  we  need  to  set  N  >  \og(Q/a)  ■  26.4, 
for  100-bit  security  level  we  need  N  >  \og(Q/a)  ■  29.1,  and  for  128-bit  security  level  we  need  N  > 
l°g {Q/a)  •  33.1.  We  comment  that  these  values  are  indeed  consistent  with  the  values  reported  in  [20]. 
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C.1.1  LWE  with  Sparse  Key 


The  analysis  above  applies  to  “generic”  LWE  instance,  but  in  our  case  we  use  very  sparse  secret  keys  (with 
only  h  =  64  nonzero  coefficients,  all  chosen  as  ±1).  This  brings  up  the  question  of  whether  one  can  get 
better  attacks  against  LWE  instances  with  a  very  sparse  secret  (much  smaller  than  even  the  noise).  We 
note  that  Goldwasser  et  al.  proved  in  [16]  that  LWE  with  low-entropy  secret  is  as  hard  as  standard  LWE 
with  weaker  parameters  (for  large  enough  moduli).  Although  the  specific  parameters  from  that  proof  do  not 
apply  to  our  choice  of  parameter,  it  does  indicate  that  weak-secret  LWE  is  not  “fundamentally  weaker”  than 
standard  LWE.  In  terms  of  attacks,  the  only  attack  that  we  could  find  that  takes  advantage  of  this  sparse  key 
is  by  applying  the  reduction  technique  of  Applebaum  et  al.  [1]  to  switch  the  key  with  paid  of  the  error  vector, 
thus  getting  a  smaller  LWE  error. 

In  a  sparse-secret  LWE  we  are  given  a  random  N-by-M  matrix  A  (modulo  Q),  and  also  an  M-vector 
y  =  [sA  +  c] q .  Here  the  A’-vcctor  s  is  our  very  sparse  secret,  and  e  is  the  error  M-vector  (which  is  also 
short,  but  not  sparse  and  not  as  short  as  s). 

Below  let  A\  denotes  the  first  N  columns  of  A,  A 2  the  next  N  columns,  then  A3,  A4,  etc.  Similarly 
ei,  e2,  •  •  •  arc  the  corresponding  parts  of  the  error  vector  and  yi,  y2,  •  •  •  the  corresponding  parts  of  y.  As¬ 
suming  that  A\  is  invertible  (which  happens  with  high  probability),  we  can  transform  this  into  an  LWE 
instance  with  respect  to  secret  ei,  as  follows: 

We  have  yi  =  sA]  +  ei,  or  alternatively  A^1  yi  =  s  +  A]"1e  1.  Also,  for  i  >  1  we  have  y,;  =  sA*  +  e,;, 
which  together  with  the  above  gives  A,;Aj  1  y  1  —  y,  =  AjAj” 1  ei  —  e,.  Hence  if  we  denote 

B\  A  [  1 ,  and  for  i  >  1  B,  =f  A,  A  1 _1 , 

1  dcf  1 

and  similarly  zi  =  A]”  yi,  and  for  i  >  1  z  i  =  AiA1  y  i, 

and  then  set  B  =  {B\\B\\Bf^  . . .)  and  z  =f  (zi |z2 1 Z3 1  . . .),  and  also  f  =  (s| 02 1 e3 1  . . .)  then  we  get  the 
LWE  instance 

z  =  e\B  +  f 

with  secret  e*.  The  thing  that  makes  this  LWE  instance  potentially  easier  than  the  original  one  is  that  the 
first  part  of  the  error  vector  /  is  our  sparse/small  vector  s,  so  the  transformed  instance  has  smaller  error  than 
the  original  (which  means  that  it  is  easier  to  solve). 

Trying  to  quantify  the  effect  of  this  attack,  we  note  that  the  optimal  M  value  in  the  attack  from  Ap¬ 
pendix  C.l  above  is  obtained  at  M  =  2N,  which  means  that  the  new  error  vector  is  f  =  (s|e2),  which  has 
Euclidean  norm  smaller  than  e  =  (ei|e2)  by  roughly  a  factor  of  y/2  (assuming  that  ||s||  <C  ||ei||  ~  1 1 1 1 )- 
Maybe  some  further  improvement  can  be  obtained  by  using  a  smaller  value  for  M,  where  the  shorter  error 
may  outweigh  the  “non  optimal”  value  of  M.  However,  we  do  not  expect  to  get  major  improvement  this 
way,  so  it  seems  that  the  very  sparse  secret  should  only  add  maybe  one  bit  to  the  modulus/noise  ratio. 

C.2  The  Modulus  Size 

In  this  section  we  assume  that  we  are  given  the  parameter  N  =  cj)(rn)  (for  our  polynomial  ring  modulo 
<bm(X)).  We  also  assume  that  we  are  given  the  noise  variance  ex2,  the  number  of  levels  in  the  modulus 
chain  L,  an  additional  “slackness  parameter”  £  (whose  purpose  is  explained  below),  and  the  number  of 
nonzero  coefficients  in  the  secret  key  h.  Our  goal  is  to  devise  a  lower  bound  on  the  size  of  the  largest 
modulus  Q  used  in  the  public  key,  so  as  to  maintain  the  functionality  of  the  scheme. 
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Controlling  the  Noise.  Driving  the  analysis  in  this  section  is  a  bound  on  the  noise  magnitude  right  after 
modulus  switching,  which  we  denote  below  by  B.  We  set  our  parameters  so  that  starting  from  ciphertexts 
with  noise  magnitude  B,  we  can  perform  one  level  of  fan-in-two  multiplications,  then  one  level  of  fan-in-^ 
additions,  followed  by  key  switching  and  modulus  switching  again,  and  get  the  noise  magnitude  back  to  the 
same  B. 


•  Recall  that  in  the  “reduced  canonical  embedding  norm”,  the  noise  magnitude  is  at  most  multiplied 
by  modular  multiplication  and  added  by  modular  addition,  hence  after  the  multiplication  and  addition 
levels  the  noise  magnitude  grows  from  B  to  as  much  as  £F>2. 


•  As  we’ve  seen  in  Appendix  B. 3,  performing  key  switching  scales  up  the  noise  magnitude  by  a  factor  of 
P  and  adds  another  noise  term  of  magnitude  upto  B «s  •  qt  (before  doing  modulus  switching  to  scale  it 
back  down).  Hence  starting  from  noise  magnitude  £/i2,  the  noise  grows  to  magnitude  P£B2  +  B ks  •  qt. 
(relative  to  the  modulus  Pqt). 

Below  we  assume  that  after  key-switching  we  do  modulus  switching  directly  to  a  smaller  modulus. 


•  After  key-switching  we  can  switch  to  the  next  modulus  qt-\  to  decrease  the  noise  back  to  our  bound  B. 
Following  the  analysis  from  Appendix  B.2,  switching  moduli  from  Qt  to  qt-\  decreases  the  noise 
magnitude  by  a  factor  of  qt-i/Qt  =  1  /(P  •  pt),  and  then  add  a  noise  term  of  magnitude  /isca|e. 

Starting  from  noise  magnitude  P£B2  +  B^s  ■  qt  before  modulus  switching,  the  noise  magnitude  after 
modulus  switching  is  therefore  bounded  whp  by 


P  •  £B2  +  B Ks  •  qt 
P  ■  Pt 


+  B^,  =  5A  +  +  B.c=l. 

Pt  P 


Using  the  analysis  above,  our  goal  next  is  to  set  the  parameters  B,  P  and  the  pt  s  (as  functions  of  N.  a,  L,  £ 
and  h)  so  that  in  every  level  t  we  get  ^ — |-  BKs'^t-1  _|_  /jsca|e  <  B.  Namely  we  need  to  satisfy  at  every 
level  t  the  quadratic  inequality  (in  B) 

-B2  -  B  +  (  +  Bxaie)  <  0  .  (9) 

Pt  V.  p  _ J 

denote  this  by  Rt—i 


Observe  that  (assuming  that  all  the  primes  pt  are  roughly  the  same  size),  it  suffices  to  satisfy  this  inequality 
for  the  largest  modulus  t  =  L  —  2,  since  Rt-i  increases  with  larger  t’s.  Noting  that  Rj,-  3  >  -Bscaie,  we  want 
to  get  this  term  to  be  as  close  to  /isca ie  as  possible,  which  we  can  do  by  setting  P  large  enough.  Specifically, 
to  make  it  as  close  as  Rl-3  =  (1  +  2_n)Hsca|e  it  is  sufficient  to  set 


p  ~  2n^Ksg£-3  _  2n9triVgL-3 
Psca\e  77\/N 


2"- V_3  •  aVN, 


(10) 


Below  we  set  (say)  n  =  8,  which  makes  it  close  enough  to  use  just  Rl~3  ~  /iscaie  for  the  derivation  below. 

Clearly  to  satisfy  Inequality  (9)  we  must  have  a  positive  discriminant,  which  means  1  —  4^^  ^  /U-:;  >  0, 
or  pl-2  >  77.Rr.-3-  Using  the  value  Rl-3  ~  Bsca\e,  this  translates  into  setting 

Pi  ~  P2  ■  ■  ■  ~  PL- 2  ~  4^  •  Bsca |e  ~  308 CVN  (1 1) 


Finally,  with  the  discriminant  positive  and  all  the  pt  s  roughly  the  same  size  we  can  satisfy  Inequality  (9)  by 
setting 


B  «  — 1 -  =  ^  «  2Hscale  «  154v^V. 

2^/ PL-2  2^ 


(12) 
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The  Smallest  Modulus.  After  evaluating  our  L-level  circuit,  we  arrive  at  the  last  modulus  qo  =  po  with 
noise  bounded  by  £ B 2 .  To  be  able  to  decrypt,  we  need  this  noise  to  be  smaller  than  qo/2cm,  where  cm  is 
the  ring  constant  for  our  polynomial  ring  modulo  4>.m(X).  For  our  setting,  that  constant  is  always  below  40, 
so  a  sufficient  condition  for  being  able  to  decrypt  is  to  set 

go  =  P0  ~  80 £H2  «  220'9£Ar  (13) 

The  Encryption  Modulus.  Recall  that  freshly  encrypted  ciphertext  have  noise  1 3 clean  (as  defined  in  Equa¬ 
tion  (6)),  which  is  larger  than  our  baseline  bound  B  from  above.  To  reduce  the  noise  magnitude  after  the  first 
modulus  switching  down  to  B ,  we  therefore  set  the  ratio  pl-i  =  <7l-i/<7l-2  so  that  Bc\ean/pi  _  |  —  /iscaie  < 
B.  This  means  that  we  set 


PL- 1 


-Bclean  74Ar  +  858\fN  f—\T  11 

— - — -  ps  - — - PS  VN  +11 

P  —  Bsca  |e  77  y/N 


(14) 


The  Largest  Modulus.  Having  set  all  the  parameters,  we  are  now  ready  to  calculate  the  resulting  bound 
on  the  largest  modulus,  namely  Ql-2  =  Ql- 2  •  P •  Using  Equations  (11),  and  (13),  we  get 

qt  =  po-\\_Pi  «  (22a9£lV)  •  (308 =  220'9  •  308*  •  £*+1  •  Nt/2+1.  (15) 

i= 1 

Now  using  Equation  (10)  we  have 

P  ps  25qL_3cr\/N  ps  225'9  •  308i_3  •  £i_2  •  lV(i"3)/2+1  •  crv/]V 
ps  2  •  308l  •  iL-2aNL/2 


and  finally 

Ql-2  =  P  •  QL- 2  ~  (2  •  308l  •  £ L~2aNL/ 2)  •  (220-9  •  308l“2  •  £L_1  •  NL/2) 

PS  CT  •  216-5L+5A  ■  £2L-3  •  Nl  (16) 


C.3  Putting  It  Together 

We  now  have  in  Equation  (8)  a  lower  bound  on  N  in  terms  of  Q,  a  and  the  security  level  k,  and  in  Equa¬ 
tion  (16)  a  lower  bound  on  0  with  respect  to  N,  a  and  several  other  parameters.  We  note  that  c  is  a  free 
parameter,  since  it  drops  out  when  substituting  Equation  (16)  in  Equation  (8).  In  our  implementation  we 
used  cr  =  3.2,  which  is  the  smallest  value  consistent  with  the  analysis  in  [23]. 

For  the  other  parameters,  we  set  £  =  8  (to  get  a  small  “wiggle  room”  without  increasing  the  parameters 
much),  and  set  the  number  of  nonzero  coefficients  in  the  secret  key  at  h  =  64  (which  is  already  included  in 
the  formulas  from  above,  and  should  easily  defeat  exhaustive-search/birthday  type  of  attacks).  Substituting 
these  values  into  the  equations  above  we  get 

po  «  223'91V,  pi  »  2 u-3y/N  for i  =  -  2 

P  «  211-3L-5lVi/2,  and  QL-2^  222-5L~3-6aNL. 
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Substituting  the  last  value  of  Ql- 2  into  Equation  (8)  yields 


(L(logJV  +  23)-8.5)(fc  +  110) 

7.2  1 

Targeting  k  =  80-bits  of  security  and  solving  for  several  different  depth  parameters  L,  we  get  the  results  in 
the  table  below,  which  also  lists  approximate  sizes  for  the  primes  p,  and  P. 


L 

N 

log2(Po) 

log  2(Pi) 

log2  (PL—l) 

log  2(P) 

10 

9326 

37.1 

17.9 

7.5 

177.3 

20 

19434 

38.1 

18.4 

8.1 

368.8 

30 

29749 

38.7 

18.7 

8.4 

564.2 

40 

40199 

39.2 

18.9 

8.6 

762.2 

50 

50748 

39.5 

19.1 

8.7 

962.1 

60 

61376 

39.8 

19.2 

8.9 

1163.5 

70 

72071 

40.0 

19.3 

9.0 

1366.1 

80 

82823 

40.2 

19.4 

9.1 

1569.8 

90 

93623 

40.4 

19.5 

9.2 

1774.5 

Choosing  Concrete  Values.  Having  obtained  lower-bounds  on  N  =  and  other  parameters,  we  now 
need  to  fix  precise  cyclotomic  fields  Q(£ m)  1(1  support  the  algebraic  operations  we  need.  We  have  two 
situations  we  will  be  interested  in  for  our  experiments.  The  first  corresponds  to  performing  arithmetic  on 
bytes  in  F2s  (i.e.  n  =  8),  whereas  the  latter  corresponds  to  arithmetic  on  bits  in  F2  (i.e.  n  =  1).  We  therefore 
need  to  find  an  odd  value  of  m,  with  (p(rn)  ~  N  and  m  dividing  2d  —  1,  where  we  require  that  d  is  divisible 
by  n.  Values  of  m  with  a  small  number  of  prime  factors  are  preferred  as  they  give  rise  to  smaller  values  of 
Cm.  We  also  look  for  parameters  which  maximize  the  number  of  slots  i  we  can  deal  with  in  one  go,  and 
values  for  which  </>(m)  is  close  to  the  approximate  value  for  N  estimated  above.  When  n  =  1  we  always 
select  a  set  of  parameters  for  which  the  i  value  is  at  least  as  large  as  that  obtained  when  n  =  8. 


L 

m 

n  = 
N  =  (f>(m) 

cs 

OO 

CK 

m 

n  = 
N  =  ( f)(m ) 

1 

id,l) 

CK 

10 

11441 

10752 

(48,224) 

3.60 

11023 

10800 

(45,240) 

5.13 

20 

34323 

21504 

(48,448) 

6.93 

34323 

21504 

(48,448) 

6.93 

30 

31609 

31104 

(72,432) 

5.15 

32377 

32376 

(57,568) 

1.27 

40 

54485 

40960 

(64,640) 

12.40 

42799 

42336 

(21,2016) 

5.95 

50 

59527 

51840 

(72,720) 

21.12 

54161 

52800 

(60,880) 

4.59 

60 

68561 

62208 

(72,864) 

36.34 

85865 

63360 

(60,1056) 

12.61 

70 

82603 

75264 

(56,1344) 

36.48 

82603 

75264 

(56,1344) 

36.48 

80 

92837 

84672 

(56,1512) 

38.52 

101437 

85672 

(42,2016) 

19.13 

90 

124645 

98304 

(48,2048) 

21.07 

95281 

94500 

(45,2100) 

6.22 

D  Scale(c,  qt,  qt~  1)  in  dble-CRT  Representation 

Let  q{  =  n;=  _ 0  p j ,  where  the  pfs  arc  primes  that  split  completely  in  our  cyclotomic  field  A.  We  arc  given 
a  c  £  Aqt  represented  via  double-CRT  -  that  is,  it  is  represented  as  a  “matrix”  of  its  evaluations  at  the 
primitive  m-th  roots  of  unity  modulo  the  primes  po .....  pi .  We  want  to  modulus  switch  to  qt-\  -  i.e.,  scale 
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down  by  a  factor  of  pt.  Let’s  recall  what  this  means:  we  want  to  output  d  e  A,  represented  via  double-CRT 
format  (as  its  matrix  of  evaluations  modulo  the  primes  po, . . .  .  pt  - i),  such  that 

1.  d  =  c  mod  2. 

2.  d  is  very  close  (in  terms  of  its  coefficient  vector)  to  c/pt- 

In  the  main  body  we  explained  how  this  could  be  performed  in  dble-CRT  representation.  This  made  explicit 
use  of  the  fact  that  the  two  ciphertexts  need  to  be  equivalent  modulo  two.  If  we  wished  to  replace  two  with 
a  general  prime  p,  then  things  arc  a  bit  more  complicated.  For  completeness,  although  it  is  not  required  in 
our  scheme,  we  present  a  methodology  below.  In  this  case,  the  conditions  on  d  arc  as  follows: 

1 .  d  =  c  •  pt  mod  p. 

2.  d  is  very  close  to  c. 

3.  d  is  divisible  by  pt. 

As  before,  we  set  d  <—  d /pt-  (Note  that  for  p  =  2,  we  trivially  have  c-pt  =  c  mod  p,  since  pt  will  be  odd.) 

This  causes  some  complications,  because  we  set  d  <—  c  +  5 ,  where  6  =  —  c  mod  p/  (as  before)  but  now 
5  =  (pt  —  1)  •  c  mod  p.  To  compute  such  a  5,  we  need  to  know  c  mod  p.  Unfortunately,  we  don’t  have 
c  mod  p.  One  not-very-satisfying  way  of  dealing  with  this  problem  is  the  following.  Set  c  [pt]p-c  mod  (p. 
Now,  if  c  encrypted  m,  then  c  encrypts  [pi]P  ■  m,  and  c’s  noise  is  [pi]P  <  p/2  times  as  large.  It  is  obviously 
easy  to  compute  c’s  double-CRT  format  from  c’s.  Now,  we  set  d  so  that  the  following  is  true: 

1 .  d  =  c  mod  p. 

2.  d  is  very  close  to  c. 

3.  d  is  divisible  by  pt. 

This  is  easy  to  do.  The  algorithm  to  output  d  in  double-CRT  format  is  as  follows: 

1.  Set  c  to  be  the  coefficient  representation  of  c  mod  p\  .  (Computing  this  requires  a  single  “small  FFT” 
modulo  the  prime  pt.) 

2.  Set  5  to  be  the  polynomial  with  coefficients  in  (— pt  ■  p/2,pt  •  p/2]  such  that  5  =  0  rnodp  and 
6  =  — c  mod  pt- 

3.  Set  d  =  c  +  5,  and  output  c^’s  double-CRT  representation. 

(a)  We  already  have  c’s  double-CRT  representation. 

(b)  Computing  5’s  double-CRT  representation  requires  t  “small  FFTs”  modulo  the  p  fs. 


E  Other  Optimizations 

Some  other  optimizations  that  we  encountered  during  our  implementation  work  are  discussed  next.  Not  all 
of  these  optimizations  are  useful  for  our  current  implementation,  but  they  may  be  useful  in  other  contexts. 
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Three-way  Multiplications.  Sometime  we  need  to  multiply  several  ciphertexts  together,  and  if  their  num¬ 
ber  is  not  a  power  of  two  then  we  do  not  have  a  complete  binary  tree  of  multiplications,  which  means  that  at 
some  point  in  the  process  we  will  have  three  ciphertexts  that  we  need  to  multiply  together. 

The  standard  way  of  implementing  this  3-way  multiplication  is  via  two  2-argument  multiplications,  e.g., 
x  ■  (y  ■  z).  But  it  turns  out  that  here  it  is  better  to  use  “raw  multiplication”  to  multiply  these  three  ciphertexts 
(as  done  in  [7]),  thus  getting  an  “extended”  ciphertext  with  four  elements,  then  apply  key-switching  (and 
later  modulus  switching)  to  this  ciphertext.  This  takes  only  six  ring-multiplication  operations  (as  opposed 
to  eight  according  to  the  standard  approach),  three  modulus  switching  (as  opposed  to  four),  and  only  one 
key  switching  (applied  to  this  4-element  ciphertext)  rather  than  two  (which  arc  applied  to  3-element  ex¬ 
tended  ciphertexts).  All  in  all,  this  three-way  multiplication  takes  roughly  1.5  times  a  standard  two-element 
multiplication. 

We  stress  that  this  technique  is  not  useful  for  larger  products,  since  for  more  than  three  multiplicands 
the  noise  begins  to  grow  too  large.  But  with  only  three  multiplicands  we  get  noise  of  roughly  B3  after  the 
multiplication,  which  can  be  reduced  to  noise  «  B  by  dropping  two  levels,  and  this  is  also  what  we  get  by 
using  two  standard  two-element  multiplications. 

Commuting  Automorphisms  and  Multiplications.  Recalling  that  the  automorphisms  X  i->-  X%  com¬ 
mute  with  the  arithmetic  operations,  we  note  that  some  ordering  of  these  operations  can  sometimes  be 
better  than  others.  For  example,  it  may  be  better  perform  the  multiplication-by-constant  before  the  auto¬ 
morphism  operation  whenever  possible.  The  reason  is  that  if  we  perform  the  multiply-by-constant  after  the 
key-switching  that  follows  the  automorphism,  then  added  noise  term  due  to  that  key-switching  is  multiplied 
by  the  same  constant,  thereby  making  the  noise  slightly  larger.  We  note  that  to  move  the  multiplication-by- 
constant  before  the  automorphism,  we  need  to  multiply  by  a  different  constant. 

Switching  to  higher-level  moduli.  We  note  that  it  may  be  better  to  perform  automorphisms  at  a  higher 
level,  in  order  to  make  the  added  noise  term  due  to  key-switching  small  with  respect  to  the  modulus.  On 
the  other  hand  operations  at  high  levels  are  more  expensive  than  the  same  operations  at  a  lower  level.  A 
good  rule  of  thumb  is  to  perform  the  automorphism  operations  one  level  above  the  lowest  one.  Namely, 
if  the  naive  strategy  that  never  switches  to  higher-level  moduli  would  perform  some  Frobenius  operation 
at  level  q, .  then  we  perform  the  key-switching  following  this  Frobenius  operation  at  level  Qi+i,  and  then 
switch  back  to  level  ql+ \  (rather  then  using  Q,  and  q,). 

Commuting  Addition  and  Modulus-switching.  When  we  need  to  add  many  terms  that  were  obtained 
from  earlier  operations  (and  their  subsequent  key-switching),  it  may  be  better  to  first  add  all  of  these  terms 
relative  to  the  large  modulus  Q,  before  switching  the  sum  down  to  the  smaller  q,  (as  opposed  to  switching 
all  the  terms  individually  to  q,  and  then  adding). 

Reducing  the  number  of  key-switching  matrices.  When  using  many  different  automorphisms  k,  :  I  4 
X1  we  need  to  keep  many  different  key-switching  matrices  in  the  public  key,  one  for  every  value  of  i  that 
we  use.  We  can  reduces  this  memory  requirement,  at  the  expense  of  taking  longer  to  perform  the  automor¬ 
phisms.  We  use  the  fact  that  the  Galois  group  (/a I  that  contains  all  the  maps  re*  (which  is  isomorphic  to 
(Z/mZ)*)  is  generated  by  a  relatively  small  number  of  generators.  (Specifically,  for  our  choice  of  parame¬ 
ters  the  group  (Z/mZ)*  has  two  or  three  generators.)  It  is  therefore  enough  to  store  in  the  public  key  only 
the  key-switching  matrices  corresponding  to  Kgj’s  for  these  generators  g:j  of  the  group  Qal.  Then  in  order 
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to  apply  a  map  k,  we  express  it  as  a  product  of  the  generators  and  apply  these  generators  to  get  the  effect  of 
Ki.  (For  example,  if  i  =  g\  •  g->  then  we  need  to  apply  nfn  twice  followed  by  a  single  application  of  n!n.) 
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Fully  Homomorphic  Encryption  without  Modulus  Switching 

from  Classical  GapSVP 

Zvika  Brakerski* 


Abstract 

We  present  a  new  tensoring  technique  for  LWE-based  fully  homomorphic  encryption.  While 
in  all  previous  works,  the  ciphertext  noise  grows  quadratically  ( B  — >  B 2  ■  poly(n))  with  every 
multiplication  (before  “refreshing”),  our  noise  only  grows  linearly  (B  B  •  poly(n)). 

We  use  this  technique  to  construct  a  scale-invariant  fully  homomorphic  encryption  scheme, 
whose  properties  only  depend  on  the  ratio  between  the  modulus  q  and  the  initial  noise  level  B , 
and  not  on  their  absolute  values. 

Our  scheme  has  a  number  of  advantages  over  previous  candidates:  It  uses  the  same  modulus 
throughout  the  evaluation  process  (no  need  for  “modulus  switching”),  and  this  modulus  can 
take  arbitrary  form.  In  addition,  security  can  be  classically  reduced  from  the  worst-case  hard¬ 
ness  of  the  GapSVP  problem  (with  quasi-polynomial  approximation  factor),  whereas  previous 
constructions  could  only  exhibit  a  quantum  reduction  from  GapSVP. 
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1  Introduction 


Fully  homomorphic  encryption  has  been  the  focus  of  extensive  study  since  the  first  candidate 
scheme  was  introduced  by  Gentry  [Gen09b].  In  a  nutshell,  fully  homomorphic  encryption  allows  to 
perform  arbitrary  computation  on  encrypted  data.  It  can  thus  be  used,  for  example,  to  outsource 
a  computation  to  a  remote  server  without  compromising  data  privacy. 

The  first  generation  of  fully  homomorphic  schemes  [Gen09b,  DGHV10,  SV10,  BVlla,  CMNT11, 
GH11]  that  started  with  Gentry’s  seminal  work,  all  followed  a  similar  and  fairly  complicated 
methodology,  often  relying  on  relatively  strong  computational  assumptions.  A  second  genera¬ 
tion  of  schemes  started  with  the  work  of  Brakerski  and  Vaikuntanathan  [BVllb],  who  established 
full  homomorphism  in  a  simpler  way,  based  on  the  learning  with  errors  (LWE)  assumption.  Using 
known  reductions  [Reg05,  Pei09],  the  security  of  their  construction  is  based  on  the  (often  quantum) 
hardness  of  approximating  some  short  vector  problems  in  worst-case  lattices.  Their  scheme  was 
then  improved  by  Brakerski,  Gentry  and  Vaikuntanathan  [BGV12],  as  we  describe  below. 

In  LWE-based  schemes  such  as  [BVllb,  BGV12],  ciphertexts  are  represented  as  vectors  in  Z9,  for 
some  modulus  q.  The  decryption  process  is  essentially  computing  an  inner  product  of  the  ciphertext 
and  the  secret  key  vector,  which  produces  a  noisy  version  of  the  message  (the  noise  is  added  at 
encryption  for  security  purposes).  The  noise  increases  with  every  homomorphic  operation,  and 
correct  decryption  is  guaranteed  if  the  final  noise  magnitude  is  below  qj 4.  Homomorphic  addition 
roughly  doubles  the  noise,  while  homomorphic  multiplication  roughly  squares  it. 

In  the  [BVllb]  scheme,  after  L  levels  of  multiplication  (e.g.  evaluating  a  depth  L  multiplication 
tree),  the  noise  grows  from  an  initial  magnitude  of  B,  to  B2  .  Hence,  to  enable  decryption,  a  very 
large  modulus  q  ~  B2  was  required.  This  affected  both  efficiency  and  security  (the  security  of  the 
scheme  depends  inversely  on  the  ratio  q/B,  so  bigger  q  for  the  same  B  means  less  security). 

The  above  was  improved  by  [BGV12],  who  suggested  to  scale  down  the  ciphertext  vector  after 
every  multiplication  (they  call  this  “modulus  switching”,  see  below).1  That  is,  to  go  from  a  vector 
c  over  Z q,  into  the  vector  c/w  over  7Lq/w  (for  some  scaling  factor  w).  Scaling  “switches”  the 
modulus  q  to  a  smaller  q/w,  but  also  reduces  the  noise  by  the  same  factor  (from  B  to  B/w).  To  see 
why  this  change  of  scale  is  effective,  consider  scaling  by  a  factor  B  after  every  multiplication  (as 
indeed  suggested  by  [BGV12]):  After  the  first  multiplication,  the  noise  goes  up  to  B 2 ,  but  scaling 
brings  it  back  down  to  B,  at  the  cost  of  reducing  the  modulus  to  q/B.  With  more  multiplications, 
the  noise  magnitude  always  goes  back  to  B ,  but  the  modulus  keeps  reducing.  After  L  levels 
of  multiplication-and-scaling,  the  noise  magnitude  is  still  B,  but  the  modulus  is  down  to  q/BL . 
Therefore  it  is  sufficient  to  use  q  ~  BL+1 ,  which  is  significantly  lower  than  before.  However,  this 
process  results  in  a  complicated  homomorphic  evaluation  process  that  “climbs  down  the  ladder  of 
moduli” . 

The  success  of  the  scaling  methodology  teaches  us  that  perspective  matters:  scaling  does  not 
change  the  ratio  between  the  modulus  and  noise,  but  it  still  manages  the  noise  better  by  changing 
the  perspective  in  which  we  view  the  ciphertext.  In  this  work,  we  suggest  to  work  in  an  invariant 
perspective  where  only  the  ratio  q/B  matters  (and  not  the  absolute  values  of  q,B  as  in  previ¬ 
ous  works).  We  derive  a  scheme  that  is  superior  to  the  previous  best  known  in  simplicity,  noise 
management  and  security.  Details  follow. 

XA  different  scaling  technique  was  already  suggested  in  [BVllb]  as  a  way  to  simplify  decryption  and  improve 
efficiency,  but  not  to  manage  noise. 
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1.1  Our  Results 


As  explained  above,  we  present  a  scale  invariant  scheme,  by  finding  an  invariant  perspective.  The 
idea  is  very  natural  based  on  the  outlined  motivation:  if  we  scale  down  the  ciphertext  by  a  factor 
of  q,  we  get  a  fractional  ciphertext  modulo  1,  with  noise  magnitude  B/q.  In  this  perspective,  all 
choices  of  q,B  with  the  same  B/q  ratio  will  look  the  same.  It  turns  out  that  in  this  perspective, 
homomorphic  multiplication  does  not  square  the  noise,  but  rather  multiplies  it  by  a  polynomial 
factor  p{n)  that  depends  only  on  the  security  parameter.2  After  L  levels  of  multiplication,  the  noise 
will  grow  from  B/q  to  {B/q)  ■  p(n)L ,  which  means  that  we  only  need  to  use  q  ~  B  ■  p{n)L . 

Interestingly,  the  idea  of  working  modulo  1  goes  back  to  the  early  works  of  Ajtai  and  Dwork  [AD97], 
and  Regev  [Reg03],  and  to  the  first  formulation  of  LWE  [Reg05].  In  a  sense,  we  are  “going  back  to 
the  roots”  and  showing  that  these  early  ideas  are  instrumental  in  the  construction  of  homomorphic 
encryption. 

For  technical  reasons,  we  don’t  implement  the  scheme  over  fractions,  but  rather  mimic  the 
invariant  perspective  over  7Lq  (see  Section  1.2  for  more  details).  Perhaps  surprisingly,  the  resulting 
scheme  is  exactly  Regev’s  original  LWE-based  scheme,  with  additional  auxiliary  information  for  the 
purpose  of  homomorphic  evaluation.  The  properties  of  our  scheme  are  summarized  in  the  following 
theorem: 

Theorem.  There  exists  a  homomorphic  encryption  scheme  for  depth  L  circuits,  based  on  the 
DLWEnj(J)X  assumption  (n- dimensional  decision-LWE  modulo  q,  with  noise  x)>  so  long  as 

q/B  >  (0(n log  q))L+°<Kl'>  , 
where  B  is  a  bound  on  the  values  of  x- 

The  resulting  scheme  has  a  number  of  interesting  properties: 

1.  Scale  invariance.  Homomorphic  properties  only  depend  on  q/B  (as  explained  above). 

2.  No  modulus  switching.  We  work  with  a  single  modulus  q.  We  don’t  need  to  switch  moduli 
as  in  [BVllb,  BGV12].  This  leads  to  a  simpler  description  of  the  scheme  (and  hopefully  better 
implementations) . 

3.  No  restrictions  on  the  modulus.  Our  modulus  q  can  take  any  form  (so  long  as  it  satisfies 
the  size  requirement).  This  is  achieved  by  putting  the  message  bit  in  the  most  significant  bit 
of  the  ciphertext,  rather  than  least  significant  as  in  previous  homomorphic  schemes  (this  can 
be  interpreted  as  making  the  message  scale  invariant).  We  note  that  for  odd  q ,  the  least  and 
most  significant  bit  representations  are  interchangeable. 

In  particular,  in  our  scheme  q  can  be  a  power  of  2,  which  can  simplify  implementation  of 
arithmetics.3  In  previous  schemes,  such  q  could  not  be  used  for  binary  message  spaces.4 

This,  again,  is  going  back  to  the  roots:  Early  schemes  such  as  [Reg05],  and  in  a  sense  also 
[AD97,  GGH97] ,  encoded  ciphertexts  in  the  most  significant  bits.  Switching  to  least  significant 
bit  encoding  was  (perhaps  ironically)  motivated  by  improving  homomorphism. 

2More  accurately,  a  polynomial  p(n,  log  q),  but  w.l.o.g  q  <  2”. 

3  On  the  downside,  such  q  might  reduce  efficiency  when  using  ring- LWE  (see  below)  due  to  FFT  embedding  issues. 

4[GHSlla]  gain  on  efficiency  by  using  moduli  that  are  “almost”  a  power  of  2. 
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4.  No  restrictions  on  the  secret  key  distribution.  While  [BGV12]  requires  that  the  secret 
key  is  drawn  from  the  noise  distribution  (LWE  in  Hermite  normal  form),  our  scheme  works 
under  any  secret  key  distribution  for  which  the  LWE  assumption  holds. 

5.  Classical  Reduction  from  GapSVP.  One  of  the  appeals  of  LWE-based  cryptography  is 
the  known  quantum  (Regev  [Reg05])  and  classical  (Peikert  [Pei09])  reductions  from  the  worst 
case  hardness  of  lattice  problems.  Specifically  to  GapSVP7,  which  is  the  problem  of  deciding, 
given  an  n  dimensional  lattice  and  a  number  d,  between  the  following  two  cases:  either  the 
lattice  has  a  vector  shorter  than  d,  or  it  doesn’t  have  any  vector  shorter  than  y(n)  •  d.  The 
value  of  7  depends  on  the  ratio  qj B  (essentially  7  =  ( q/B )  ■  O(n)),  and  the  smaller  7  is,  the 
better  the  reduction  (GapSVP2n(n)  is  an  easy  problem). 

Peikert ’s  classical  reduction  requires  that  q  ~  2n/2,  which  makes  his  reduction  unusable  for 
previous  homomorphic  schemes,  since  7  becomes  exponential.  For  example,  in  [BGV12], 
q/B  =  q/q VR+b  =  g1_1/U+ B  which  translates  to  7  ~  2n/2  for  the  required  q .5 

In  our  scheme,  this  problem  does  not  arise.  We  can  instantiate  our  scheme  with  any  q  while 
hardly  affecting  the  ratio  q/B.  We  can  therefore  set  q  ~  2n/2  and  get  a  classical  reduction  from 
GapSVPfto(iogn) ,  which  is  currently  solvable  only  in  2^(n)  time.  (This  is  mostly  of  theoretical 
interest,  though,  since  efficiency  considerations  will  favor  the  smallest  possible  q.) 

Using  our  scheme  as  a  building  block  we  achieve: 

1.  Fully  homomorphic  encryption  using  bootstrapping.  Using  Gentry’s  bootstrapping 
theorem,  we  present  a  leveled  fully  homomorphic  scheme  based  on  the  classical  worst  case 
GapSVPno(iogn)  problem.  As  usual,  an  additional  circular  security  assumption  is  required  to 
get  a  non-leveled  scheme. 

2.  Leveled  fully  homomorphic  encryption  without  bootstrapping.  Very  similarly  to 
[BGV12],  our  scheme  can  be  used  to  achieve  leveled  homomorphism  without  bootstrapping. 

3.  Increased  efficiency  using  ring- LWE  (RLWE).  RLWE  (defined  in  [LPR10])  is  a  version 
of  LWE  that  works  over  polynomial  rings  rather  than  the  integers.  Its  hardness  is  quantumly 
related  to  short  vector  problems  in  ideal  lattices. 

RLWE  is  a  stronger  assumption  than  LWE,  but  it  can  dramatically  improve  the  efficiency  of 
schemes  [BVlla,  BGV12,  GHSllb].  Our  methods  are  readily  portable  to  the  RLWE  world. 

In  summary,  our  construction  carries  conceptual  significance  in  its  simplicity  and  in  a  number 
of  theoretical  aspects.  Its  practical  usefulness  compared  to  other  schemes  is  harder  to  quantify, 
though,  since  it  will  vary  greatly  with  the  specific  implementation  and  optimizations  chosen. 

1.2  Our  Techniques 

Our  starting  point  is  Regev’s  public  key  encryption  scheme.  There,  the  encryption  of  a  message 
m  £  {0, 1}  is  an  integer  vector  c  such  that  (c,  s)  =  [|J  •  m  +  e  +  ql,  for  an  integer  I  and  for  |e|  <  E, 
for  some  bound  E  <  q/ 4.  The  secret  key  vector  s  is  also  over  the  integers.  We  can  assume  w.l.o.g 

5  Peikert  suggests  to  classically  base  small-g  LWE  on  a  new  lattice  problem  that  he  introduces. 
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that  the  elements  of  c,s  are  in  the  segment  (—q/2,q/2].  (We  note  that  previous  homomorphic 
constructions  used  a  different  variant  of  Regev’s  scheme,  where  (c,  s)  =  m  +  2e  +  ql.) 

In  this  work,  we  take  the  invariant  perspective  on  the  scheme,  and  consider  the  fractional 
ciphertext  c  =  c/q.  It  holds  that  (c,  s)  =  §  ■  m  +  e  +  I,  where  /  €  Z  and  |e|  <  E/q  =  e.  The 
elements  of  c  are  now  rational  numbers  in  (—1/2, 1/2].  Note  that  the  secret  key  does  not  change 
and  is  still  over  Z. 

Additive  homomorphism  is  immediate:  if  ci  encrypts  mi  and  C2  encrypts  m2,  then  cadd  =  C1+C2 
encrypts  [mi  +  m2]2-  The  noise  grows  from  e  to  ~  2e.  Multiplicative  homomorphism  is  achieved 
by  tensoring  the  input  ciphertexts: 


Cmult  —  2  •  Cl  C2  . 


The  tensored  ciphertext  can  be  decrypted  using  a  tensored  secret  key  because 


(  2  •  ci  <g)  c2,  s  <g)  s)  =  2  •  (ci,  s)  •  (c2,  s)  . 

V - "V - ' 

Cmult 

A  “key  switching”  mechanism  developed  in  [BVllb]  and  generalized  in  [BGY12]  allows  to  switch 
back  from  a  tensored  secret  key  into  a  “normal”  one  without  much  additional  noise.  The  details  of 
this  mechanism  are  immaterial  for  this  discussion.  We  focus  on  the  noise  growth  in  the  tensored 
ciphertext. 

We  want  to  show  that  2  •  (ci,s)  •  (c2,s)  ~  \m1m2  +  e'  +  I7,  for  a  small  e' .  To  do  this,  we  let 
Ii,  1-2  €  Z  be  integers  such  that  (ci,  s)  =  \m\  +  ei  +  Ii,  and  likewise  for  C2.  It  can  be  verified  that 
|/i| ,  I/2I  are  bounded  by  ~  Hs^.  We  therefore  get: 

2  •  (ci,s)  •  (c2,s)  =  2  •  (imi  +  ei  +  h)  ■  (|m2  +  e2  +  h) 

=  tmim2  +  2(ei/2  +  e2/i)  +  eim2  +  e2mi  +  2eie2  +  (mi/2  +  m2/i  +  2/1/2)  • 

' - v- - ' 

ez 

Interestingly,  the  cross-term  eie2  that  was  responsible  for  the  squaring  of  the  noise  in  previous 
schemes,  is  now  practically  insignificant  since  e2  <C  e.  The  significant  noise  term  in  the  above 
expression  is  2(e\l2  +  e2h),  which  is  bounded  by  Odls^)  •  e.  All  that  is  left  to  show  now  is  that 
||s||  1  is  independent  of  B,  q  and  only  depends  on  n  (recall  that  we  allow  dependence  on  log  <7  <  n). 

On  the  face  of  it,  ||s||  1  ~  n  ■  q,  since  the  elements  of  s  are  integers  in  the  segment  (—q/2,q/2]. 
In  order  to  reduce  the  norm,  we  use  binary  decomposition  (which  was  used  in  [BVllb,  BGV12]  for 
different  purposes).  Let  denote  the  binary  vector  that  contains  the  jth  bit  from  each  element 
of  s.  Namely  s  =  ]>V  2-Js^h  Then 

(e,  s>  =  £  27(c,  sw)  =  ((c,  2c, . . .),  (s(0),  s(1), . .  .)>  . 
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This  means  that  we  can  convert  a  ciphertext  c  that  corresponds  to  a  secret  key  s  in  Z,  into  a 
modified  ciphertext  (c,  2c, . . .)  that  corresponds  to  a  binary  secret  key  (s^°\  s^1), . . .).  The  norm  of 
the  binary  key  is  at  most  its  dimension,  which  is  polynomial  in  n  as  required.6 

6Reducing  the  norm  of  s  was  also  an  issue  in  [BGV12].  There  it  was  resolved  by  using  LWE  in  Hermite  normal 
form,  where  s  is  sampled  from  the  noise  distribution  and  thus  || s||  1  «  n-  B.  This  suffices  when  B  must  be  very  small, 
as  in  [BGV12],  but  not  in  our  setting. 
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We  point  out  that  an  alternative  solution  to  the  norm  problem  follows  by  using  the  dual-Regev 
scheme  of  [GPV08]  as  the  basic  building  block.  There,  the  secret  key  is  natively  binary  and  of 
low  norm.  (In  addition,  as  noticed  in  previous  works,  working  with  dual-Regev  naturally  implies  a 
weak  form  of  homomorphic  identity  based  encryption.)  However,  the  ciphertexts  and  some  other 
parameters  will  need  to  grow. 

Finally,  working  with  fractional  ciphertexts  brings  about  issues  of  precision  in  representation 
and  other  problems.  We  thus  implement  our  scheme  over  Z  with  appropriate  scaling:  Each  rational 
number  x  in  the  above  description  will  be  represented  by  the  integer  y  =  | [qx]  (which  determines 
x  up  to  an  additive  factor  of  l/2q).  The  addition  operation  x\  +  x'2  is  mimicked  by  y\  +  y2  ~ 
Yq(x\  +  £2)].  To  mimic  multiplication,  we  take  |_(r/i  •  2/2 ) / <7l  ~  Lxi  '  x2  ■  q]  ■  Our  tensored  ciphertext 
for  multiplication  will  thus  be  defined  as  |  •  ci  <g>  C2  !,  where  ci,C2  are  integer  vectors  and  the 
tensoring  operation  is  over  the  integers.  In  this  representation,  encryption  and  decryption  become 
identical  to  Regev’s  original  scheme. 

1.3  Paper  Organization 

Section  2  defines  notational  conventions  (we  define  7Lq  is  a  slightly  unconventional  way,  the  reader 
is  advised  to  take  notice),  introduces  the  LWE  assumption  and  defines  homomorphic  encryption 
and  related  terms.  Section  3  introduces  our  building  blocks:  Regev’s  encryption  scheme,  binary 
decomposition  of  vectors  and  the  key  switching  mechanism.  Finally,  in  Section  4  we  present  and 
analyze  our  scheme,  and  discuss  several  possible  optimizations. 

2  Preliminaries 

For  an  integer  q ,  we  define  the  set  7Lq  =  (—q/2,  q/2]  n  Z.  We  stress  that  in  this  work,  7Lq  is  not 
synonymous  with  the  ring  TL/qL.  In  particular,  all  arithmetics  is  performed  over  Z  (or  Q  when 
division  is  used)  and  not  over  any  sub-ring.  For  any  x  €  Q,  we  let  y  =  [x\q  denote  the  unique  value 
y  €  (— q/2 ,  q/2]  such  that  y  =  x  (mod  q)  (i.e.  y  is  congruent  to  x  modulo  q). 

We  use  |x]  to  indicate  rounding  x  to  the  nearest  integer,  and  |V|  (for  x  >  0)  to  indicate 
rounding  down  or  up.  All  logarithms  are  to  base  2. 

Probability.  We  use  x  •£-  T>  to  denote  that  x  is  sampled  from  a  distribution  T>.  Similarly, 
x  •£-  S  denotes  that  x  is  uniform  over  a  set  S.  We  define  R-bounded  distributions  as  ones  whose 
magnitudes  never  exceed  R:8 

Definition  2.1.  A  distribution  x  over  the  integers  is  B-bounded  (denoted  |y|  <  B)  if  it  is  only 
supported  on  [—B,B]. 

A  function  is  negligible  if  it  vanishes  faster  than  any  inverse  polynomial.  Two  distributions  are 
statistically  indistinguishable  if  the  total  variation  distance  between  them  is  negligible,  and  compu¬ 
tationally  indistinguishable  if  no  polynomial  test  distinguishes  them  with  non-negligible  advantage. 

1  For  example,  if  x  =  2,  y  =  —3  £  Z7,  then  x  ■  y  =  —6  0  Z7,  however  [x  ■  y\7  =  1  £  Z7. 

®This  definition  is  simpler  and  slightly  different  from  previous  works. 
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Vectors,  Matrices  and  Tensors.  We  denote  scalars  in  plain  (e.g.  x)  and  vectors  in  bold  low¬ 
ercase  (e.g.  v),  and  matrices  in  bold  uppercase  (e.g.  A).  For  the  sake  of  brevity,  we  use  (x,  y)  to 
refer  to  the  vector  [x7  |yT] T . 

The  ti  norm  of  a  vector  is  denoted  by  ||v|L.  Inner  product  is  denoted  by  (v,  u),  recall  that 
(v,  u)  =  v7  •  u.  Let  v  be  an  n  dimensional  vector.  For  all  i  =  1, . . . ,  n,  the  ith  element  in  v  is 
denoted  v[*].  When  applied  to  vectors,  operators  such  as  [•]  ,  [•]  are  applied  element-wise. 

The  tensor  product  of  two  vectors  v,  w  of  dimension  n,  denoted  v  <8>  w,  is  the  n2  dimensional 
vector  containing  all  elements  of  the  form  v[i]w[jj.  Note  that 

(v  <g>  w,  x  <8>  y)  =  (v,  x)  •  (w,  y)  . 

2.1  Learning  With  Errors  (LWE) 

The  LWE  problem  was  introduced  by  Regev  [Reg05]  as  a  generalization  of  “learning  parity  with 
noise”.  For  positive  integers  n  and  q  >  2,  a  vector  s  €  Z”,  and  a  probability  distribution  y  on  Z, 

let  ASiX  be  the  distribution  obtained  by  choosing  a  vector  a  4—  Z”  uniformly  at  random  and  a 

noise  term  e  -e-  y,  and  outputting  (a,  [(a,  s)  +  e]  )  €  Z”  x  TLq.  Decisional  LWE  (DLWE)  is  defined 
as  follows. 

Definition  2.2  (DLWE).  For  an  integer  q  =  q(n)  and  an  error  distribution  y  =  y(n)  over  Z,  the 
(average-case)  decision  learning  with  errors  problem,  denoted  DLWE„im)giX,  is  to  distinguish  (with 
non-negligible  advantage)  m  samples  chosen  according  to  ASiX  (for  uniformly  random  s  Z q),  from 
m  samples  chosen  according  to  the  uniform  distribution  over  Z”xZg.  We  denote  by  DLWEn)(JjX  the 
variant  where  the  adversary  gets  oracle  access  to  ASiX,  and  is  not  a-priori  bounded  in  the  number 
of  samples. 

There  are  known  quantum  (Regev  [Reg05])  and  classical  (Peikert  [Pei09])  reductions  between 
DLWE„jmiq)X  and  approximating  short  vector  problems  in  lattices.  Specifically,  these  reductions  take 
y  to  be  (discretized  versions  of)  the  Gaussian  distribution,  which  is  statistically  indistinguishable 
from  R-bounded,  for  an  appropriate  B.  Since  the  exact  distribution  y  does  not  matter  for  our 
results,  we  state  a  corollary  of  the  results  of  [Reg05,  Pei09]  (in  conjunction  with  the  search  to 
decision  reduction  of  Micciancio  and  Mol  [MM11]  and  Micciancio  and  Peikert  [MP11])  in  terms  of 
the  bound  B.  These  results  also  extend  to  additional  forms  of  q  (see  [MM11,  MP11]). 

Corollary  2.1  ([Reg05,  Pei09,  MM11,  MP11]).  Let  q  =  q(n)  €  N  be  either  a  prime  power  q  =  pr ,  or 
a  product  of  co-prime  numbers  q  =  n  Qi  such  that  for  all  i,  qi  =  poly(n),  and  let  B  >  tu(logn)  •  \fn. 
Then  there  exists  an  efficiently  sampleable  B-bounded  distribution  y  such  that  if  there  is  an  efficient 
algorithm  that  solves  the  (average-case)  DLWEnj?;X  problem.  Then: 

•  There  is  an  efficient  quantum  algorithm  that  solves  GapSVP^  (and  SIVP Q^n.q/B))  on 
any  n-dimensional  lattice. 

•  If  in  addition  q  >  0( 2n/2),  then  there  is  an  efficient  classical  algorithm  for  GapSVP o(n-q/B) 
on  any  n-dimensional  lattice. 

In  both  cases,  if  one  also  considers  distinguishers  with  sub-polynomial  advantage,  then  we  require 
B  >  0(n)  and  the  resulting  approximation  factor  is  slightly  larger  ■  q/B). 
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Recall  that  GapSVP7  is  the  (promise)  problem  of  distinguishing,  given  a  basis  for  a  lattice  and 
a  parameter  d,  between  the  case  where  the  lattice  has  a  vector  shorter  than  d,  and  the  case  where 
the  lattice  doesn’t  have  any  vector  shorter  than  7  •  d.  SIVP  is  the  search  problem  of  finding  a  set 
of  “short”  vectors.  We  refer  the  reader  to  [Reg05,  Pei09]  for  more  information. 

The  best  known  algorithms  for  GapSVP7  ([Sch87,  MV10])  require  at  least  2^(n/logU  time.  The 
scheme  we  present  in  this  work  reduces  from  7  =  n°^ogn\  for  which  the  best  known  algorithms 
run  in  time  2n(n\ 

As  a  final  remark,  we  mention  that  Peikert  also  shows  a  classical  reduction  in  the  case  of  small 
values  of  q ,  but  this  reduction  is  from  a  newly  defined  “([-to-7  decisional  shortest  vector  problem” , 
which  is  not  as  extensively  studied  as  GapSVP. 

2.2  Homomorphic  Encryption  and  Bootstrapping 

We  now  define  homomorphic  encryption  and  introduce  Gentry’s  bootstrapping  theorem.  Our  defi¬ 
nitions  are  mostly  taken  from  [BVllb,  BGV12], 

A  homomorphic  (public-key)  encryption  scheme  HE  =  (HE. Keygen,  HE.Enc,  HE. Dec,  HE.Eval)  is 
a  quadruple  of  ppt  algorithms  as  follows  (n  is  the  security  parameter): 

•  Key  generation  ( pk ,  evk,  sfc)-f-HE.Keygen(ln):  Outputs  a  public  encryption  key  pk,  a  public 
evaluation  key  evk  and  a  secret  decryption  key  sk.9 

•  Encryption  c-t—  HE.Encpfc(m):  Using  the  public  key  pk,  encrypts  a  single  bit  message  m  € 
{0, 1}  into  a  ciphertext  c. 

•  Decryption  m<—  HE.Decsfc(c):  Using  the  secret  key  sk,  decrypts  a  ciphertext  c  to  recover 
the  message  m  €  {0, 1}. 

•  Homomorphic  evaluation  Cf<—  HE.Evalet)fc(/,  ci, . . . ,  eg):  Using  the  evaluation  key  evk,  ap¬ 
plies  a  function  /  :  {0,  l}e  — *  {0, 1}  to  ci, . . . ,  eg,  and  outputs  a  ciphertext  c/. 

As  in  previous  works,  we  represent  /  as  an  arithmetic  circuit  over  GF(2)  with  addition  and 
multiplication  gates.  Thus  it  is  customary  to  “break”  HE.Eval  into  homomorphic  addition 
Cadd^- HE.Addet)fc(ci, C2)  and  homomorphic  multiplication  cmu|t<—  HE.Multe„fc(ci, C2). 

A  homomorphic  encryption  scheme  is  said  to  be  secure  if  it  is  semantically  secure  (note  that 
the  adversary  is  given  both  pk  and  evk). 

Homomorphism  w.r.t  depth-bounded  circuits  and  full  homomorphism  are  defined  next: 

Definition  2.3  (L-homomorphism).  A  scheme  HE  is  L -homomorphic,  for  L  =  L(n),  if  for  any 
depth  L  arithmetic  circuit  f  (over  GF(2) )  and  any  set  of  inputs  mi, . . . ,  mg,  it  holds  that 

Pr  [HE.Decsfc(HE.Evale„fc(/,  ci, . . . ,  eg))  /  f(mi,...,mg)\  =  negl(ra)  , 

where  (pk,  evk,  sk)<—  HE.Keygen(ln)  and  Ci<—  HE.Encpfc(mj). 

Definition  2.4  (compactness  and  full  homomorphism).  A  homomorphic  scheme  is  compact  if  its 
decryption  circuit  is  independent  of  the  evaluated  function.  A  compact  scheme  is  fully  homomorphic 
if  it  is  L-homomorphic  for  any  polynomial  L.  The  scheme  is  leveled  fully  homomorphic  if  it  takes 
1L  as  additional  input  in  key  generation. 

9 We  adopt  the  terminology  of  [BVllb]  that  treats  the  evaluation  key  as  a  separate  entity  from  the  public  key. 
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Gentry’s  bootstrapping  theorem  shows  how  to  go  from  L-hoiriomorphisiri  to  full  homomorphism: 

Theorem  2.2  (bootstrapping  [Gen09b,  Gen09a]).  If  there  exists  an  L-homomorphic  scheme  whose 
decryption  circuit  depth  is  less  than  L,  then  there  exists  a  leveled  fully  homomorphic  encryption 
scheme. 

Furthermore,  if  the  aforementioned  L-homomorphic  scheme  is  also  weak  circular  secure  (re¬ 
mains  secure  even  against  an  adversary  who  gets  encryptions  of  the  bits  of  the  secret  key),  then 
there  exists  a  fully  homomorphic  encryption  scheme. 


3  Building  Blocks 

In  this  section,  we  present  building  blocks  from  previous  works  that  are  used  in  our  construction. 
Specifically,  like  all  LWE-based  fully  homomorphic  schemes,  we  rely  on  Regev’s  [Reg05]  basic 
public-key  encryption  scheme  (Section  3.1).  We  also  use  the  key-switching  methodology  of  [BVllb, 
BGV12]  (Section  3.2). 


3.1  Regev’s  Encryption  Scheme 


Let  q  =  q(n)  be  an  integer  function  and  let  x  =  x(n)  be  a  distribution  ensemble  over  Z.  The 
scheme  Regev  is  defined  as  follows: 

•  Regev. SecretKeygen(ln):  Sample  s  e-  Z”.  Output  sk  =  s. 

•  Regev. PublicKeygen(s):  Let  IV  =  (n  +  1)  •  (log <7  +  0(1)).  Sample  A  •£-  Z^xn  and  e  <£-  \N ■ 
Compute  b:=  [A  •  s  +  e]  ,  and  define 

P:=  [b||  —  A]  eZfbR  . 

Output  pk  =  P. 


Regev. Encp/C(?n):  To  encrypt  a  message  m  £  {0, 1}  using  pk  =  P,  sample  r  €  {0, 1}^  and  output 
ciphertext 


c:= 


PT-r  + 


m 


€  2£+1  , 


J  q 


where  m  =  (in,  0, . . . ,  0)  £  {0,  l}n+1. 

•  Regev. Decsfc(c):  To  decrypt  c  £  Z™+1  using  secret  key  sk  =  s,  compute 


m:= 


2  [<c,(l,s))]gT 


Correctness.  We  analyze  the  noise  magnitude  at  encryption  and  decryption.  We  start  with  a 
lemma  regarding  the  noise  magnitude  of  properly  encrypted  ciphertexts: 


Lemma  3.1  (encryption  noise).  Let  q,n,N ,  \x\  <  B  be  parameters  for  Regev.  Let  s  £  Zn  be  any 
vector  and  m  £  {0,1}  be  some  bit.  Set  P^Regev.PublicKeygen(s)  and  c^—  Regev. Encp(m).  Then 
for  some  e  with  \e\  <  N  ■  B  it  holds  that 


(c,(M)) 


■  m  +  e  (mod  q)  . 
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Proof.  By  definition 


(c,  (l,s))  =  (pT-r+  |  •  m,  (1,  s)^  (mod  q) 

=  ^  -m  +  rTP-(l,s)  (mod  q) 

q  T  T 

=  —  •  m  +  r  b  —  r  As  (mod  q) 

q 

=  -  ■  m  +  ( r,  e)  (mod  q)  . 

The  lemma  follows  since  |(r,  e)|  <  N  ■  B.  □ 

We  proceed  to  state  the  correctness  of  decryption  for  low-noise  ciphertexts.  The  proof  easily 
follows  by  assignment  into  the  definition  of  Regev.Dec  and  is  omitted. 

Lemma  3.2  (decryption  noise).  Let  s  G  Zn  be  some  vector,  and  let  c  G  Z”+1  be  such  that 

q 

(c,  (l,s))  =  -m  +  e  (mod  (?)  , 

with  m  G  {0, 1}  and  |e|  <  [q/“2\  /2.  Then 

Regev.Decs(c)  =  m  . 

Security.  The  following  lemma  states  the  security  of  Regev.  The  proof  is  standard  (see  e.g.  [Reg05]) 
and  is  omitted. 

Lemma  3.3.  Let  n,  q ,  x  be  some  parameters  such  that  DLWEnj(?!X  holds.  Then  for  any  m  G  {0, 1};  if 
s^— Regev. SecretKeygen(ln);  P^— Regev. PublicKeygen(s),  c^— Regev. Encp(m);  it  holds  that  the  joint 
distribution  (P,c)  is  computationally  indistinguishable  from  uniform  over  Z^x('n+1'>  x  Z”+1. 

3.2  Vector  Decomposition  and  Key  Switching 

We  show  how  to  decompose  vectors  in  a  way  that  preserves  inner  product  and  how  to  generate  and 
use  key  switching  parameters.  Our  notation  is  generally  adopted  from  [BGV12]. 

Vector  Decomposition.  We  often  break  vectors  into  their  bit  representations  as  defined  below: 

•  BitDecompy(x):  For  x  G  Zn,  let  w*  G  {0,  l}n  be  such  that  x  =  ^  1  2* •  w?;  (mod  q ).  Output 

the  vector 

•  PowersOfTwog(y):  For  y  G  Zn,  output 

'(y,  2  ■  y, . . . ,  2rio*«i-1 .  y)l  GZ^b«l  . 

L  j  q 

We  will  usually  omit  the  subscript  q  when  it  is  clear  from  the  context. 

Claim  3.4.  For  all  q  G  Z  and  x,  y  G  Zn;  it  holds  that 

(x,y)  =  (BitDecompg(x),  PowersOfTwOq(y))  (mod  q)  . 
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Key  Switching.  In  the  functions  below,  q  is  an  integer  and  X  is  a  distribution  over  Z: 

•  SwitchKeyGen  (s,t):  For  a  “source”  key  s  £  Zns  and  “target”  key  t  £  Znt,  we  define  a  set  of 
parameters  that  allow  to  switch  ciphertexts  under  s  into  ciphertexts  under  ( 1 ,  t) . 

Let  hs  =  ns  ■  [~log(/]  be  the  dimension  of  PowersOfTwOq(s).  Sample  a  uniform  matrix  As;t  ■£- 
Z”sXnt  and  a  noise  vector  e  £-  xns  ■  The  function’s  output  is  a  matrix 

Ps:t  =  [bs:t||  -As:t]  £Z ^xK+r)  ? 

where 

bs  t:=  As:t  •  t  +  es:t  +  PowersOfTwOq(s)  €  Z"s  . 

L  iq 

This  is  similar,  although  not  identical,  to  encrypting  PowersOfTwoy(s)  (the  difference  is  that 
PowersOfTwOg(s)  contains  non-binary  values). 

•  Switch Key9 (Ps;t,  cs):  To  switch  a  ciphertext  from  a  secret  key  s  to  (1,  t) ,  output 

<H’=  [pL  •  BitDecomP(?(cs)]?  . 

Again,  we  usually  omit  the  subscripts  when  they  are  clear  from  the  context.  Correctness  and 
security  are  stated  below,  the  proofs  are  by  definition. 

Lemma  3.5  (correctness).  Lets  £  Zns,  t  £  Znt  and  cs  €  Z”s  be  any  vectors.  Let  Ps:t£-  SwitchKeyGen(s,  t) 
and  set  ct£-  Switch  Key  (Ps;t,  cs).  Then 

(cs,s)  =  (ct,  (1,  t))  -  (BitDecomp?(cs),  es:t)  (mod  q)  . 

Lemma  3.6  (security).  Let  s  £  Zns  be  any  vector.  If  we  generate  t«—  Regev.SecretKeygen(ln)  and 
P^-SwitchKeyGen^  (s,  t),  then  P  is  computationally  indistinguishable  from  uniform  overXqsX('nt+1\ 
assuming  DLWEniq;X. 

4  A  Scale  Invariant  Homomorphic  Encryption  Scheme 

We  present  our  scale  invariant  L-homomorphic  scheme  as  outlined  in  Section  1.2.  Homomorphic 
properties  are  discussed  in  Section  4.1,  implications  and  optimizations  are  discussed  in  Section  4.2. 

Let  q  =  q(n)  be  an  integer  function,  let  L  =  L(n)  be  a  polynomial  and  let  x  =  x(n)  be  a 
distribution  ensemble  over  Z.  The  scheme  SI-HE  is  defined  as  follows: 

•  SI-HE. Keygen(lL,  ln):  Sample  L  +  1  vectors  so,  •  •  • ,  Regev.SecretKeygen(ln),  and  compute  a 
Regev  public  key  for  the  first  one:  Po«—  Regev.PublicKeygen(so).  For  all  i  £  [L\,  define 

Sj_i:=BitDecomp((l,  s,_i))  (8>  BitDecomp((l,  s*_i))  €  {0,  l}((n+1)!log,tl)2  . 

and  compute 

P(j_i):jf- SwitchKeyGen  (sj_i,Sj)  . 

Output  pk  =  Po,  evk  =  {P(j_i).j}j6[Ll  and  sk  =  s l- 
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•  SI-HE.EnCpfc(m):  Identical  to  Regev’s,  output  c-t—  Regev.Encpfc(m). 

•  SI-HE.Evale^fc(-):  As  usual,  we  describe  homomorphic  addition  and  multiplication  over  GF(2), 
which  allows  to  evaluate  depth  L  arithmetic  circuits  in  a  gate-by-gate  manner.  The  convention 
for  a  gate  at  level  i  of  the  circuit  is  that  the  operand  ciphertexts  are  decryptable  using  s*_i,  and 
the  output  of  the  homomorphic  operation  is  decryptable  using  Sj. 

Since  evk  contains  key  switching  parameters  from  Sj_i  to  s.j,  homomorphic  addition  and  mul¬ 
tiplication  both  first  produce  an  intermediate  output  c  that  corresponds  to  Sj_i,  and  then  use 
key  switching  to  obtain  the  final  output.10 

—  SI-HE.Adde„fc(ci,  C2):  Assume  w.l.o.g  that  both  input  ciphertexts  are  encrypted  under  the 
same  secret  key  Sj_i.  First  compute 

Cadd:=PowersOfTwo(ci  +  C2)  <8>  PowersOfTwo((l,  0, . . . ,  0))  , 


then  output 

cadd^SwitchKey(P(j_1).i,cadd)  £  ZJ+1  . 

Let  us  explain  what  we  did:  We  first  added  the  ciphertext  vectors  (as  expected)  to  obtain 
Ci  +  C2-  This  already  implements  the  homomorphic  addition,  but  provides  an  output  that 
corresponds  to  s*_i  and  not  s*  as  required.  We  thus  generate  cadd  by  tensoring  with  a  “trivial” 
ciphertext.  The  result  corresponds  to  Sj_i,  and  allows  to  finally  use  key  switching  to  obtain 
an  output  corresponding  to  s,;.  We  use  powers-of-two  representation  in  order  to  control  the 
norm  of  the  secret  key  (as  we  explain  in  Section  1.2). 

—  SI-EIE.Multet,fc(ci,  C2):  Assume  w.l.o.g  that  both  input  ciphertexts  are  encrypted  under  the 
same  secret  key  Sj_i.  First  compute 


Win  It  ■ 


•  ^PowersOfTwo(ci)  ®  PowersOfTwo(c2)^ 


then  output 

CmUit^SwitchKey(P(i_1):i,cmu|t)  £ZJ+1  . 

As  we  explain  in  Section  1.2,  The  tensored  ciphertext  cmu]t  mimics  tensoring  in  the  “invari¬ 
ant  perspective”,  which  produces  an  encryption  of  the  product  of  the  plaintexts  under  the 
tensored  secret  key  s*_i.  We  then  switch  keys  to  obtain  an  output  corresponding  to  s*. 


•  Decryption  SI-HE.Decsfc(c):  Assume  w.l.o.g  that  c  is  a  ciphertext  that  corresponds  to  ( =sk ). 

Then  decryption  is  again  identical  to  Regev’s,  output 


m<—  Regev.Decsfc(c)  . 

10The  final  key  switching  replaces  the  more  complicated  “relresh”  operation  of  [BGV12]. 
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Security.  The  security  of  the  scheme  follows  in  a  straightforward  way,  very  similarly  to  the  proof 
of  [BVllb,  Theorem  4.1]  as  we  sketch  below. 

Lemma  4.1.  Let  n,  q,  x  be  some  parameters  such  that  DLWEnj(3)X  holds,  and  let  L  =  L(n )  be  polyno- 
mially  bounded.  Then  for  any  m  €  {0, 1},  if(pk,  evk,  sk)<r-  SI-HE. Keygen(lL,  1"),  c<—  SI-HE. Encpfc(m), 
it  holds  that  the  joint  distribution  ( pk ,  evk,  c)  is  computationally  indistinguishable  from  uniform. 

Proof  sketch.  We  consider  the  distribution  (pk,  evk,  c)  =  (Pq,  Pon,  •  •  • ,  Pl-i:l,  c)  and  apply  a  hy¬ 
brid  argument. 

First,  we  argue  that  Pl—i-.l  is  indistinguishable  from  uniform,  based  on  Lemma  3.6  (note  that  s l 
is  only  used  to  generate  Pl-1:l)-  We  then  proceed  to  replace  all  P?;_i:j  with  uniform  in  descending 
order,  based  on  the  same  argument.  Finally,  we  are  left  with  (Pq,c)  (and  a  multitude  of  uniform 
elements),  which  are  exactly  a  public  key  and  ciphertext  of  Regev’s  scheme.  We  invoke  Lemma  3.3 
to  argue  that  (Po,c)  are  indistinguishable  from  uniform,  which  completes  the  proof  of  our  lemma. 

We  remark  that  generally  one  has  to  be  careful  when  using  a  super-constant  number  of  hybrids, 
but  in  our  case,  as  in  [BVllb],  this  causes  no  problem.  □ 

4.1  Homomorphic  Properties  of  SI-HE 

The  following  theorem  summarizes  the  homomorphic  properties  of  our  scheme. 

Theorem  4.2.  The  scheme  SI-HE  with  parameters  n,q,  |y|  <  B,L  for  which 

q/B>(0(n\ogq))L+0^  , 


is  L -homomorphic. 

The  theorem  is  proven  using  the  following  lemma,  which  bounds  the  growth  of  the  noise  in  gate 
evaluation. 


Lemma  4.3.  Letq,n,\x\  <  B,  L  be  parameters  for  SI -HE,  and  let  (pk,  evk,  sk)<r-  SI-HE.  Keygen(lL,  ln). 
Let  ci,  C'2  be  such  that 


with  | ei  |  ,  |e2 1  <  E  < 
Then 


(C!,(l,Si_i)) 

(c2,  (1,  Sj_i)) 
[q/ 2j  /2.  Define 


=  •  mi  +  e\  (mod  q) 

=  ^  •m1+e2  (mod  q)  ,  (1) 

t-add^  SI-HE.Adde^d,  C2),  Cmu|t^  SI -H E. M ultgi,/^ (ci ,  C2). 


(*-add  t  (1)  Sj)) 
(f-mult)  (1)  Sj)) 


Q 

_2_ 

Q 

.2. 


■  ( [mi  +  m2\2  )  +  eadd 

mim2  T  Crnult 


(mod  q) 
(mod  q)  , 


where 

I eadd |  ,  |emuit|  <  0(n\ogq)  ■  rna x{E,  (nlog2  q)  ■  B }  . 

We  remark  that,  as  usual,  homomorphic  addition  increases  noise  much  more  moderately  than 
multiplication,  but  the  coarse  bound  we  show  in  the  lemma  is  sufficient  for  our  purposes. 

Next  we  show  how  to  use  Lemma  4.3  to  prove  Theorem  4.2.  The  proof  of  Lemma  4.3  itself  is 
deferred  to  Section  4.3. 
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Proof  of  Theorem  f.2.  Consider  the  evaluation  of  a  depth  L  circuit.  Let  £)  be  a  bound  on  the 
noise  in  the  ciphertext  after  the  evaluation  of  the  ith  level  of  gates. 

By  Lemma  3.1,  Eq  =  N  ■  B  =  O(nlogg)  ■  B.  Lemma  4.3  guarantees  that  starting  from  the 
point  where  E  >  (nlog2g)  •  B,  it  will  hold  that  El+\  =  0(n  log  g)  ■  Ei.  We  get  that  Ejj  = 
(< 0{n\ogq))L+0<'1'>  ■  B. 

By  Lemma  3.2,  decryption  will  succeed  if  El  <  |_g/2j  /2  and  the  theorem  follows.  □ 

4.2  Implications  and  Optimizations 

Fully  Homomorphic  Encryption  using  Bootstrapping.  Fully  homomorphic  encryption  fol¬ 
lows  using  the  bootstrapping  theorem  (Theorem  2.2).  In  order  to  use  bootstrapping,  we  need  to 
bound  the  depth  of  the  decryption  circuit.  The  following  lemma  has  been  proven  in  a  number  of 
previous  works  (e.g.  [BVllb,  Lemma  4.5]): 

Lemma  4.4.  For  all  c,  the  function  /c( s)  =  SI-HE.Decs(c)  can  be  implemented  by  a  circuit  of 
depth  0(log  n  +  log  log  q ) . 

An  immediate  corollary  follows  from  Theorem  2.2,  Theorem  4.2  and  Lemma  4.4: 

Corollary  4.5.  Let  n,q,XiB  be  such  that  \x\  <  B  and  q/B  >  (n  log  g)°(logn+loglog<?\  Then  there 
exists  a  (leveled)  fully  homomorphic  encryption  scheme  based  on  the  DLWEnj?!X  assumption. 

Furthermore,  if  S I -H  E  is  weak  circular  secure,  then  the  same  assumption  implies  fidl  (non 
leveled)  homomorphism. 

Finally,  we  can  classically  reduce  security  from  GapSVP  using  Corollary  2.1,  by  choosing  q  = 
0(2n/2)  and  B  =  g/(nlogg)°(Iogn+loglog9)  =  g/n°(logn): 

Corollary  4.6.  There  exists  a  (leveled)  fully -homomorphic  encryption  scheme  based  on  the  classical 
worst  case  hardness  of  the  GapSVPno(iogn)  problem. 

(Leveled)  Fully  Homomorphic  Encryption  without  Bootstrapping.  Following  [BGV12], 
our  scheme  implies  a  leveled  fully  homomorphic  encryption  without  bootstrapping.  Plugging  our 
scheme  into  the  [BGV12]  framework,  we  obtain  a  (leveled)  fully  homomorphic  encryption  without 
bootstrapping,  based  on  the  classical  worst  case  hardness  of  GapSVP2ne,  for  any  e  >  0. 

Optimizations.  So  far,  we  chose  to  present  our  scheme  in  the  cleanest  possible  way.  However, 
there  are  a  few  techniques  that  can  somewhat  improve  performance.  While  the  asymptotic  advan¬ 
tage  of  some  of  these  methods  is  not  great,  a  real  life  implementation  can  benefit  from  them. 

1.  Our  tensored  secret  key  s.;_i  is  obtained  by  tensoring  a  vector  with  itself.  Such  a  vector 
can  be  represented  by  only  (”s)  (as  opposed  to  our  n2),  saving  a  factor  of  (almost)  2  in  the 
representation  length. 

2.  When  B  <C  g,  some  improvement  can  be  achieved  by  using  LWE  in  Hermite  normal  form. 
It  is  known  (see  e.g.  [ACPS09])  that  the  hardness  of  LWE  remains  essentially  the  same  if 
we  sample  se/  (instead  of  uniformly  in  Z”).  Sampling  our  keys  this  way,  we  only  need 
0(n log B)  bits  to  represent  BitDecomp(s),  and  its  norm  goes  down  accordingly. 
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We  can  therefore  reduce  the  size  of  the  evaluation  key  (which  depends  quadratically  on 
the  bit  length  of  the  secret  key),  and  more  importantly,  we  can  prove  a  tighter  version  of 
Lemma  4.3.  When  using  Hermite  normal  form,  the  noise  grows  from  E  to  0(n\ogB)  ■ 
ma x{E,  (nlogBlogq)  ■  B}.  Therefore,  L-homomorphism  is  achieved  whenever 

q/B  >  (0(n\og  B))L+°^  •  log q  . 

3.  The  least  significant  bits  of  the  ciphertext  can  sometimes  be  truncated  without  much  harm, 
which  can  lead  to  significant  saving  in  ciphertext  and  key  length,  especially  when  B/q  is 
large:  Let  c  be  an  integer  ciphertext  vector  and  define  c'  =  •  c~| .  Then  c',  which  can  be 

represented  with  n  ■  i  fewer  bits  than  c,  implies  a  good  approximation  for  c  since 

f(c,s)  -  (2*  •  c',s)|  <  21-1  Hsll,  . 

This  means  that  2*  •  c7  can  be  used  instead  of  c,  at  the  cost  of  an  additive  increase  in  the 
noise  magnitude. 

Consider  a  case  where  q,B  2>  q/B  (which  occurs  when  we  artificially  increase  q  in  order 
for  the  classical  reduction  to  work).  Recall  that  || s||  1  ~  n\ogq  and  consider  truncating  with 
i  ~  log(L>/(nlogg)).  Then  the  additional  noise  incurred  by  using  c'  instead  of  c  is  only 
an  insignificant  ~  B.  The  number  of  bits  required  to  represent  each  element  in  c'  however 
now  becomes  logg  —  i  ~  log  {q/B)  +  log(nlogg).  In  conclusion,  we  hardly  lose  anything  in 
ciphertext  length  compared  to  the  case  of  working  with  smaller  q,  B  to  begin  with  (with 
similar  q/B  ratio).  The  ciphertext  length  can,  therefore,  be  made  invariant  to  the  absolute 
values  of  q,  B,  and  depend  only  on  their  ratio.  This  of  course  applies  also  to  the  vectors  in 
evk. 

4.3  Proof  of  Lemma  4.3 

We  start  with  the  analysis  for  addition,  which  is  simpler  and  will  also  serve  as  good  warm-up 
towards  the  analysis  for  multiplication. 

Analysis  for  Addition.  By  Lemma  3.5,  it  holds  that 

(cadd,  (1,  Sj))  =  (cadd,  Sj)  +  (BitDecomp(c),  ei_i:i)  (mod  q)  . 

' - - - 

where  ej_i:i  ~  T^(n+1)2-(ri°g|jl)3.  That  is,  <5i  is  the  noise  inflicted  by  the  key  switching  process. 

We  bound  |«5i  |  using  the  bound  on  y: 

I  hi  |  =  |(BitDecomp(cadd),ei_i:i)|  <  (n  +  l)2  •  ([logg])3  •  B  =  0(n2  log3  q)  ■  B  . 

Next,  we  expand  the  term  (cadd,  s /),  by  breaking  an  inner  product  of  tensors  into  a  product  of 
inner  products  (one  of  which  is  trivially  equal  to  1): 

(cadd,Sj)  =  ^PowersOfTwo(ci  +  C2)  <8>  PowersOfTwo((l,  0, . . . ,  0)), 

BitDecomp((l,  Sj_i))  <g)  BitDecomp((l,  Sj_i))^ 

=  (PowersOfTwo(ci  +  C2),  BitDecomp((l,  s,_i)))  •  1 
=  <(ci +c2),  (l,Si_i))  (mod  q) 

=  (c1,(l,Si_1))  +  (c2,(l,Si_1))  (mod  q)  . 
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We  can  now  plug  in  what  we  know  about  ci,  C2  from  Eq.  (1)  in  the  lemma  statement: 


(Cadd  i  Sj)  — 


q 

L2 

q 

.2 


mi  +  ei  + 


m2  +  e2  (mod  q) 


[mi  +  m2]2  -rh  +  e\  +  e2  ,  (mod  q ) 


=<52 


where  fh  €  {0, 1}  is  defined  as: 


~  A 

m  = 


0,  if  q  is  even, 

\  ■  (m\  +  m2  —  [mi  +  m2]2),  if  q  is  odd, 


and  |<52|  <  1  +  2 E. 

Putting  it  all  together, 


(cadd,  (1,  Si))  =  -  •  [mi  +  m2]2  +  Si  +  <S2  (mod  q) 


— eadd 


Where  the  bound  on  eadd  is 

|eadd I  =  |<5i  +  i>2|  <  0(n2  log3  q)  ■  B  +  0(1)  •  E  <  0(n\ogq)  ■  max  {E,  (nlog2  q)  ■  B } 
This  finishes  the  argument  for  addition. 


Analysis  for  Multiplication.  The  analysis  for  multiplication  starts  very  similarly  to  addition: 
(cmuit,  (1,  Si))  =  (cmLJ|t,  §i)  +  (BitDecomp(cmu|t),  ei_i:i)  (mod  q)  , 


and  as  before 


|<5i|  =  0(n2  log3  q)  ■  B 


Let  us  now  focus  on  (cmu|t,§i).  We  want  to  use  the  properties  of  tensoring  to  break  the  inner 
product  into  two  smaller  inner  products,  as  we  did  before.  This  time,  however,  cmu|t  is  a  rounded 
tensor: 

2 


(Mi)  =  ( 


-  ■  (PowersOfTwo(ci)  (g)  PowersOfTwo(c2)) 

q 


i  Si— i  ^  (mod  q) 


We  start  by  showing  that  the  rounding  does  not  add  much  noise.  Intuitively  this  is  because 
Si_i  is  a  binary  vector  and  thus  has  low  norm.  We  define 


<52  = 


-  •  (PowersOfTwo(ci)  <g)  PowersOfTwo(c2)) 

q 


,  Si-1 


—  (  -  •  (PowersOfTwo(ci)  <g)  PowersOfTwo(c2)),  s?;_i  )  , 
\q 


and  for  convenience  we  also  define 


c'  = 


-  ■  (PowersOfTwo(ci)  0  PowersOfTwo(c2)) 


- -  (PowersOfTwo(ci)  0  PowersOfTwo(c2))  . 
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By  definition,  62  =  (c',Sj_i).  Now,  since  1 1 c' 1 1 ^  <  1/2  and  ||si_i || x  <  ((n  +  1)  [logg] )2  = 
0(n2  log2  q),  it  follows  that 

N  <  lie'll  •  Hsi-iHj  =  0(n2  log2  q)  . 

We  can  now  break  the  inner  product  using  the  properties  of  tensoring: 


(cmuit,Sj_i)  -  62  =  ^  •  (PowersOfTwo(ci),  BitDecomp((l,Si_i))) 

•  (Powers0fTwo(c2),  BitDecomp((l,  Sj_i)))  .  (2) 

Note  that  we  keep  ci,  C2  in  powers-of-two  form.  This  is  deliberate  and  will  be  useful  later  (essentially 
because  we  want  our  ciphertext  to  relate  to  low- norm  secrets). 

Going  back  to  Eq.  (1)  from  the  lemma  statement,  it  follows  (using  Claim  3.4)  that 


Q 

2  J 

q 

2_ 


m\  +  e±  (mod  q) 
+  e2  (mod  q) 


(PowersOfTwo(ci),  BitDecomp(l,  Sj_i))  = 

(Powers0fTwo(c2),  BitDecomp(l,  s*_i))  = 

Let  Ii,  I2  G  Z  be  such  that 

(PowersOfTwo(ci),  BitDecomp(l,  s.j_i))  =  ^  -mi  +  e\  +  q-Ii 

(Powers0fTwo(c2),  BitDecomp(l,  s.j_i))  =  ^  •  m2  +  e2  +  q  ■  h  ■ 

Let  us  bound  the  absolute  value  of  I\  (obviously  the  same  bound  also  holds  for  I2 ): 

|(PowersOfTwo(ci),  BitDecomp(l,  Sj_i))  —  [|J  -  mi  — ei| 


\h\  = 

< 

< 

< 


|(PowersOfTwo(ci),  BitDecomp(l,  Sj_i))| 


+  1 


||PowersOfTwo(ci)||c 


|BitDecomp(l,Sj_i)||1  +  1 


||BitDecomp(l,si_i)||1  +  1 


<  -  •  (n  +  i)  fl°g«l  +  1 

=  0(n  log  q)  . 

Plugging  Eq.  (3)  into  Eq.  (2),  we  get 

2 
Q 


(cmuit,  Sj_i)  -S2  =  -  •  (  |  •  mi  +  ei  +  q  ■  I^j  •  (  |  •  m2  +  e2  +  q  •  /2) 

mi  ■  m2  +  c>3  +  q  •  (mi/2  +  rn2h  +  2Iih)  , 


q 
2  J 


where  £3  is  defined  as 


(3) 


(4) 


4.3  = 


2e2  •  I\  +  2ei  •  I2  +  (eim2  +  e2rai)  +  , 


if  q  is  even, 


(2e2  -  m2)  ■  h  +  (2ei  -  mi)  •  I2  +  ^  •  (eim2  +  e2mi)  -  mi2™2  +  ,  if  q  is  odd. 
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In  particular  (recall  that  E  <  |_g/2j  /2  <  g/4): 

1  -b  2E2 

|^3|  <  2(2 E  +  1)0  (n log q)  +  2 E  -\ - — —  =  0{n log q)  ■  E  . 

Putting  everything  together,  we  get  that 

q 

(cmuit,  (1,  Sj))  =  -  •  mim2  +  Si  +  d2  +  <?3  (mod  q)  , 

-  Z  S' 

^mult 

where 

|emuit|  =  l^i  +  +  53|  <  0(n  log  q)  ■  E  +  0(n2  log3  q)  ■  B  , 

and  the  lemma  follows.  □ 
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When  Homomorphism  Becomes  a  Liability 


Zvika  Brakerski* 


Abstract 

We  show  that  an  encryption  scheme  cannot  have  a  simple  decryption  function  and  be  homo¬ 
morphic  at  the  same  time,  even  with  added  noise.  Specifically,  if  a  scheme  can  homomorphically 
evaluate  the  majority  function,  then  its  decryption  cannot  be  weakly-learnable  (in  particular, 
linear),  even  if  large  decryption  error  is  allowed.  (In  contrast,  without  homomorphism,  such 
schemes  do  exist  and  are  presumed  secure,  e.g.  based  on  LPN.) 

An  immediate  corollary  is  that  known  schemes  that  are  based  on  the  hardness  of  decoding  in 
the  presence  of  low  hamming-weight  noise  cannot  be  fully  homomorphic.  This  applies  to  known 
schemes  such  as  LPN-based  symmetric  or  public  key  encryption. 

Using  these  techniques,  we  show  that  the  recent  candidate  fully  homomorphic  encryption, 
suggested  by  Bogdanov  and  Lee  (ePrint  ’ll,  henceforth  BL),  is  insecure.  In  fact,  we  show  two 
attacks  on  the  BL  scheme:  One  that  uses  homomorphism,  and  another  that  directly  attacks  a 
component  of  the  scheme. 
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1  Introduction 


An  encryption  scheme  is  called  homomorphic  if  there  is  an  efficient  transformation  that  given 
Enc(m)  for  some  message  m,  and  a  function  /,  produces  Enc(/(m))  using  only  public  information. 
A  scheme  that  is  homomorphic  w.r.t  all  efficient  /  is  called  fully  homomorphic  (FHE).  Homomorphic 
encryption  is  a  useful  tool  in  both  theory  and  practice  and  is  extensively  researched  in  recent  years 
(see  [Vaill]  for  survey),  and  a  few  candidates  for  full  homomorphism  are  known. 

Most  of  these  candidates  [Gen09,  GenlO,  SV10,  BVlla,  BVllb,  GH11,  BGV12,  GHS12,  Bral2] 
are  based  (either  explicitly  or  implicitly)  on  lattice  assumptions  (the  hardness  of  approximating 
short  vectors  in  certain  lattices).  In  particular,  the  learning  with  errors  (LWE)  assumption  proved 
to  be  very  useful  in  the  design  of  such  schemes.  The  one  notable  exception  is  [vDGHVIO],  but  even 
that  could  be  thought  of  as  working  over  an  appropriately  defined  lattice  over  the  integers. 

An  important  open  problem  is,  therefore,  to  diversify  and  base  fully  homomorphic  encryption 
on  different  assumptions  (so  as  to  not  put  all  the  eggs  in  one  basket).  One  appealing  direction  is  to 
try  to  use  the  learning  parity  with  noise  (LPN)  problem,  which  is  very  similar  in  syntax  to  LWE: 
Making  a  vast  generalization,  LWE  can  be  interpreted  as  a  decoding  problem  for  a  linear  code, 
where  the  noise  comes  from  a  family  of  low  norm  vectors.  Namely,  each  coordinate  in  the  code 
suffers  from  noise,  but  this  noise  is  relatively  small  (this  requires  that  the  code  is  defined  over  a 
large  alphabet).  The  LPN  assumption  works  over  the  binary  alphabet  and  requires  that  the  noise 
has  low  hamming  weight,  namely  that  only  a  small  number  of  coordinates  are  noisy,  but  in  these 
coordinates  the  noise  amplitude  can  be  large.  While  similar  in  syntax,  a  direct  connection  between 
these  two  types  of  assumptions  is  not  known. 

While  an  LPN-based  construction  is  not  known,  recently  Bogdanov  and  Lee  [BL11]  presented 
a  candidate,  denoted  by  BL  throughout  this  manuscript,  that  is  based  on  a  different  low  hamming- 
weight  decoding  problem:  They  consider  a  carefully  crafted  code  over  a  large  alphabet  and  assume 
that  decoding  in  the  presence  of  low-hamnring-weight  noise  is  hard. 

In  this  work,  we  show  that  not  only  that  BL’s  construction  is  insecure,  but  rather  the  entire 
approach  of  constructing  code  based  homomorphic  encryption  analogously  to  the  LWE  construction 
cannot  work.  We  stress  that  we  don’t  show  that  FHE  cannot  be  based  on  LPN  (or  other  code 
based  assumptions),  but  rather  that  the  decryption  algorithm  of  such  scheme  cannot  take  the 
naive  form.  (In  particular  this  applies  to  the  attempt  to  add  homomorphism  to  schemes  such  as 
[Ale03,  GRS08,  ACPS09].) 

1.1  Our  Results 

Our  main  result  shows  that  encryption  schemes  with  learnable  decryption  functions  cannot  be  ho¬ 
momorphic,  even  if  large  decryption  error  is  allowed.  In  particular,  such  schemes  cannot  evaluate 
the  majority  function.  This  extends  the  result  of  Kearns  and  Valiant  [KV94]  (slightly  extended  by 
Klivans  and  Sherstov  [KS09])  that  learnability  breaks  security  for  schemes  with  negligible  decryp¬ 
tion  error.  In  other  words,  homomorphic  capabilities  can  sometimes  make  noisy  learning  become 
no  harder  than  noiseless  learning. 

We  use  a  simplified  notion  of  learning,  which  essentially  requires  that  given  polynomially  many 
labeled  samples  (from  an  arbitrary  distribution),  the  learner’s  hypothesis  correctly  computes  the 
label  for  the  next  sample  with  probability,  say,  0.9.  We  show  that  this  notion,  that  we  call  sc- 
learning ,  is  equivalent  to  weak  learning  defined  in  [KV94],  This  allows  us  to  prove  the  following 
theorem  (in  Section  3). 
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Theorem  A.  An  encryption  scheme  whose  decryption  function  is  sc-  or  weakly-learnable,  and 
whose  decryption  error  is  1/2  — 1 /poly  (n),  cannot  homomorphically  evaluate  the  majority  function. 

Since  it  is  straightforward  to  show  that  linear  functions  are  learnable  (as  well  as,  e.g.,  low  degree 
polynomials),  the  theorem  applies  to  known  LPN  based  schemes  such  as  [Ale03,  GRS08,  ACPS09]. 
This  may  not  seem  obvious  at  first:  The  decryption  circuit  of  the  aforementioned  schemes  is 
(commonly  assumed  to  be)  hard  to  learn,  and  their  decryption  error  is  negligible,  so  they  seem  to 
be  out  of  the  scope  of  our  theorem.  However,  looking  more  closely,  the  decryption  circuits  consist 
of  an  inner  product  computation  with  the  secret  key,  followed  by  additional  post-processing.  One 
can  verify  that  if  the  post  processing  is  not  performed,  then  correct  decryption  is  still  achieved  with 
probability  >  1/2  +  1/poly.  Thus  we  can  apply  our  theorem  and  rule  out  majority-homomorphism. 

Similar  logic  rules  out  the  homomorphism  of  the  BL  candidate-FHE.  While  Theorem  A  does  not 
apply  directly  (since  the  decryption  of  BL  is  not  learnable  out  of  the  box),  we  show  that  it  contains 
a  sub-scheme  which  is  linear  (and  thus  learnable)  and  has  sufficient  homomorphic  properties  to 
render  it  insecure. 

Theorem  B.  There  is  a  successful  polynomial  time  CPA  attack  on  the  BL  scheme. 

We  further  present  a  different  attack  on  the  BL  scheme,  targeting  one  of  its  building  blocks. 
This  allows  us  to  not  only  distinguish  between  two  messages  like  the  successful  CPA  attack  above, 
but  rather  decrypt  any  ciphertext  with  probability  1  —  o(l). 

Theorem  C.  There  is  a  polynomial  time  algorithm  that  decrypts  the  BL  scheme. 

The  BL  scheme  and  the  two  breaking  algorithms  are  presented  in  Section  4. 

1.2  Our  Techniques 

Consider  a  simplified  case  of  Theorem  A,  where  the  scheme’s  decryption  function  is  learnable 
given  t  labeled  samples,  and  the  decryption  error  is  (say)  l/(10(t  +  1)).  The  proof  in  this  case 
is  straightforward:  Generate  t  labeled  samples  by  just  encrypting  random  messages.  Then  use 
the  learner’s  output  hypothesis  to  decrypt  the  challenge  ciphertext.  We  can  only  fail  if  either  the 
learner  fails  (which  happens  with  probability  0.1)  or  if  one  of  the  samples  we  draw  (including  the 
challenge)  are  not  correctly  decryptable,  in  which  case  our  labeling  is  wrong  and  therefore  the 
learner  provides  no  guarantee  (which  again  happens  with  at  most  0.1  probability).  The  union 
bound  implies  that  we  can  decrypt  a  random  ciphertext  with  probability  0.8,  which  immediately 
breaks  the  scheme.  Note  that  we  did  not  use  the  homomorphism  of  the  scheme  at  all,  indeed  this 
simplified  version  is  universally  true  even  without  assuming  homomorphism,  and  is  very  similar 
to  the  arguments  in  [KV94,  KS09].  However,  some  subtleties  arise  since  we  allow  a  non-negligible 
fraction  of  “dysfunctional”  keys  that  induce  a  much  higher  error  rate  than  others. 

The  next  step  is  to  allow  decryption  error  1/2  —  e,  which  requires  use  of  homomorphism.  The 
idea  is  to  use  the  homomorphism  in  order  to  reduce  the  decryption  error  and  get  back  to  the 
previous  case  (in  other  words,  reducing  the  noise  in  a  noisy  learning  problem).  Consider  a  scheme 
that  encrypts  each  message  many  times  (say  k),  and  then  applies  homomorphic  majority  on  the 
ciphertexts.  The  security  of  this  scheme  directly  reduces  from  that  of  the  original  scheme,  and  it 
has  the  same  decryption  function.  However,  now  the  decryption  error  drops  exponentially  with  k. 
This  is  because  in  order  to  get  an  error  in  the  new  scheme,  at  least  k/2  out  of  the  k  encryptions 
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need  to  have  errors.  Since  the  expected  number  is  (1/2  —  e)k,  the  Chernoff  bound  implies  the  result 
by  choosing  k  appropriately. 

To  derive  Theorem  B,  we  need  to  show  that  linear  functions  are  learnable:1  Assume  that  the 
decryption  function  is  an  inner  product  between  the  ciphertext  and  the  secret  key  (both  being  n- 
dimensional  vectors  over  a  field  IF).  We  will  learn  these  functions  by  taking  O(n)  labeled  samples. 
Then,  given  the  challenge,  we  will  try  to  represent  it  as  a  linear  combination  of  the  samples  we 
have.  If  we  succeed,  then  the  appropriate  linear  combination  of  the  labels  will  be  the  value  of  the 
function  on  the  challenge.  We  show  that  this  process  fails  only  with  small  constant  probability 
(intuitively,  since  we  take  0(n)  sample  vectors  from  a  space  of  dimension  at  most  n). 

We  then  show  that  BL  uses  a  sub-structure  that  is  both  linearly  decryptable  and  allows  for 
homomorphism  of  (some  sort  of)  majority.  Theorem  B  thus  follows  similarly  to  Theorem  A. 

For  Theorem  C,  we  need  to  dive  into  the  guts  of  the  BL  scheme.  We  notice  that  BL  use 
homomorphic  majority  evaluation  in  one  of  the  lower  abstraction  levels  of  their  scheme.  This 
allows  us  to  break  this  abstraction  level  using  only  linear  algebra  (in  a  sense,  the  homomorphic 
evaluation  is  already  “built  in”).  A  complete  break  of  BL  follows. 

1.3  Other  Related  Work 

An  independent  work  by  Gauthier,  Otrnani  and  Tillich  [GOT12]  shows  an  interesting  direct  attack 
on  BL’s  hardness  assumption  (we  refer  to  it  as  the  “GOT  attack”).  Their  attack  is  very  different 
from  ours  and  takes  advantage  of  the  resemblance  of  BL’s  codes  and  Reed-Solomon  codes  as  we 
explain  below. 

BL’s  construction  relies  on  a  special  type  of  error  correcting  code.  Essentially,  they  start  with 
a  Reed-Solomon  code,  and  replace  a  small  fraction  of  the  rows  of  the  generating  matrix  with  a 
special  structure.  The  homomorphic  properties  are  only  due  to  this  small  fraction  of  “significant” 
rows,  and  the  secret  key  is  chosen  so  as  to  nullify  the  effect  of  the  other  rows. 

The  GOT  attack  uses  the  fact  that  under  some  transformation  (component- wise  multiplication), 
the  dimension  of  Reed-Solomon  codes  can  grow  by  at  most  a  factor  of  two.  However,  if  a  code 
contains  “significant”  rows,  then  the  dimension  can  grow  further.  This  allows  to  measure  the 
number  of  significant  rows  in  a  given  code.  One  can  thus  identify  the  significant  rows  by  trying  to 
remove  one  row  at  a  time  from  the  code  and  checking  if  the  dimension  drops.  If  yes  then  that  row 
is  significant.  Once  all  significant  rows  have  been  identified,  the  secret  key  can  be  retrieved  in  a 
straightforward  manner. 

However,  it  is  fairly  easy  to  immunize  BL’s  scheme  against  the  GOT  attack.  As  we  explained 
above,  the  neutral  rows  do  not  change  the  properties  of  the  encryption  scheme,  so  they  may  as  well 
be  replaced  by  random  rows.  Since  the  dimension  of  random  codes  grows  very  rapidly  under  the 
GOT  transformation,  their  attack  will  not  work  in  such  case. 

Our  attack,  on  the  other  hand,  relies  on  certain  functional  properties  that  BL  use  to  make  their 
scheme  homomorphic.  Thus  a  change  in  the  scheme  that  preserves  these  homomorphic  properties 
cannot  help  to  overcome  our  attack.  In  light  of  our  attack,  it  is  interesting  to  investigate  whether 
the  GOT  attack  can  be  extended  to  the  more  general  case. 

xWe  believe  this  was  known  before,  but  since  we  could  not  find  an  appropriate  reference,  we  provide  a  proof. 
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2  Preliminaries 


We  denote  scalars  using  plain  lowercase  (x),  vectors  using  bold  lowercase  (x  for  column  vector,  x7 
for  row  vector),  and  matrices  using  bold  uppercase  (X).  We  let  1  denote  the  all-one  vector  (the 
dimension  will  be  clear  from  the  context).  We  let  denote  a  finite  held  of  cardinality  q  G  N,  with 
efficient  operations  (we  usually  don’t  care  about  any  other  property  of  the  held). 

2.1  Properties  of  Encryption  Schemes 

A  public  key  encryption  scheme  is  a  tuple  of  algorithms  (Gen,  Enc,  Dec),  such  that:  Gen(ln)  is  the 
key  generation  algorithm  that  produces  a  pair  of  public  and  secret  keys  ( pk,sk );  Enc pk(m)  is  a 
randomized  encryption  function  that  takes  a  message  m  and  produces  a  ciphertext.  In  the  context 
of  this  work,  messages  will  only  come  from  some  predefined  held  IF;  Decsfc(c)  is  the  decryption 
function  that  decrypts  a  ciphertext  c  and  produces  the  message.  Optimally,  Decsfc(Encp/c(-))  is  the 
identity  function,  but  in  some  schemes  there  are  decryption  errors. 

The  probability  of  decryption  error  is  taken  over  the  randomness  used  to  generate  the  keys  for 
the  scheme,  and  over  the  randomness  used  in  the  encryption  function  (we  assume  the  decryption  is 
deterministic).  Since  in  our  case  the  error  rates  are  high  (approaching  1/2),  the  effect  of  bad  keys 
is  different  from  that  of  bad  encryption  randomness,  and  we  thus  measure  the  two  separately.  We 
allow  a  small  fraction  of  the  keys  (one  percent,  for  the  sake  of  convenience)  to  have  arbitrarily  large 
decryption  error,  and  define  the  decryption  error  e  to  be  the  maximal  error  over  the  99%  best  keys. 
While  the  constant  1%  is  arbitrary  and  chosen  so  as  to  not  over-clutter  notation,  we  will  discuss 
after  presenting  our  results  how  they  generalize  to  other  values.  The  formal  definition  follows. 

Definition  2.1.  An  encryption  scheme  is  said  to  have  decryption  error  <  e  if  with  probability  at 
least  0.99  over  the  key  generation  it  holds  that 

max{Pr[Decsfc(EncpA.(m))  %  m ]}  <  e  , 

m 

where  the  probability  is  taken  over  the  random  coins  of  the  encryption  function. 

We  use  the  standard  definition  of  security  against  chosen  plaintext  attacks  (CPA):  The  attacker 
receives  a  public  key  and  chooses  two  values  mo, mi.  The  attacker  then  receives  a  ciphertext 
c  =  Enc pk(mb),  where  b  G  {0, 1}  is  a  random  bit  that  is  unknown  to  the  attacker.  The  attacker 
needs  to  decide  on  a  guess  b'  G  {0, 1}  as  to  the  value  of  b.  We  say  that  the  scheme  is  broken  if 
there  is  a  polynomial  time  attacker  for  which  Pr[&7  =  6]  >  1/2  +  l/poly(n)  (where  n  is  the  security 
parameter).  Recall  that  this  notion  is  equivalent  to  the  notion  of  semantic  security  [GM82], 

In  addition,  we  will  say  that  a  scheme  is  completely  broken  if  there  exists  an  adversary  that 
upon  receiving  the  public  key  and  Encpfc(m)  for  arbitrary  value  of  m,  returns  m  with  probability 
•  -"(!)• 

While  we  discuss  homomorphic  properties  of  encryption  schemes,  we  will  only  use  homomor¬ 
phism  w.r.t  the  majority  function.  We  define  the  notion  of  fc- majority- homomorphism  below. 

Definition  2.2.  A  public-key  encryption  scheme  is  fc-majority-homomorphic  (where  k  is  a  function 
of  the  security  parameter)  if  there  exists  a  function  MajEval  such  that  with  probability  0.99  over  the 
key  generation,  for  any  sequence  of  ciphertexts  output  by  Encp*.(-):  ci, . . . ,  Ck,  it  holds  that 

Decsfc(MajEvalpfc(ci,...,cfc))  =  Majority(Decsfc(ci), . . . ,  Decsk(ck))  . 
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Again  we  allow  some  “slackness”  by  allowing  some  of  the  keys  to  not  abide  the  homomorphism. 
We  note  that  Definition  2.2  above  is  a  fairly  strong  notion  of  homomorphism  in  two  aspects: 
First,  it  requires  that  homomorphism  holds  even  for  ciphertexts  with  decryption  error.  Second, 
we  do  not  allow  MajEval  to  introduce  error  for  “good”  key  pairs.  Indeed,  known  homomorphic 
encryption  schemes  have  these  properties,  but  it  is  interesting  to  try  to  bypass  our  negative  results 
by  finding  schemes  that  do  not  have  them. 

Schemes  with  linear  decryption,  as  defined  below,  have  a  special  role  in  our  attack  on  BL. 

Definition  2.3.  An  encryption  scheme  is  n-linearly  decryptable  if  its  secret  key  is  of  the  form 
sk  =  s  G  Fn,  for  some  field  F,  and  its  decryption  function  is 

Decsfc(c)  =  (s,c)  . 

2.2  Spanning  Distributions  over  Low  Dimensional  Spaces 

We  will  use  a  lemma  that  shows  that  any  distribution  over  a  low  dimensional  space  is  easy  to  span 
in  the  following  sense:  Given  sufficiently  many  samples  from  the  distribution  (a  little  more  than 
the  dimension  of  the  support),  we  are  guaranteed  that  any  new  vector  falls  in  the  span  of  previous 
samples.  This  lemma  will  allow  us  to  derive  a  (distribution-free)  learner  for  linear  functions  (see 
Section  2.3). 

We  speculate  that  this  lemma  is  already  known,  since  it  is  fairly  general  and  very  robust  to  the 
definition  of  dimension  (e.g.  it  also  applies  to  non-linear  spaces). 

Lemma  2.1.  Let  S  be  a  distribution  over  a  linear  space  S  of  dimension  s.  For  all  k,  define 

4  =  Pr  [vfc  0  Span{vi, . . . ,  vfc_i}]  . 

vi,...,vfce-S 


Then  4  <  s/k. 

Proof.  Notice  that  by  symmetry  5i  >  1  for  all  *•  Let  D{  denote  the  (random  variable)  dimension 

of  Span  {vi, . . . ,  v,;}.  Note  that  always  Di  <  s. 

Let  Ei  denote  the  event  v*  0  Spanjvi, . . . ,  v,_i},  note  that  St  =  Pr[£y.  By  definition, 


Dk  =  ^21Ei  • 


i=  1 


Therefore 

s  >  E[Dfc]  =  E 
and  the  lemma  follows. 


Ll=  1 


Y  Pr[-E)]  =  Y  4  >  k  •  4 


i= 1 


i=  1 


□ 


2.3  Learning 

In  this  work  we  use  two  equivalent  notions  of  learning:  weak- learning  as  defined  in  [KV94],  and 
an  equivalent  simplified  notion  that  we  call  single-challenge-learning  (sc-learning  for  short).  The 
latter  will  be  more  convenient  for  our  proofs,  but  we  show  that  the  two  are  equivalent.  We  will 
also  show  that  linear  functions  are  sc-learnable. 
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Notions  of  Learning.  We  start  by  introducing  the  notion  of  weak-learnability. 

Definition  2.4  (weak-learning  [KV94]).  Let  J7  =  { JvdneN  be  an  ensemble  of  binary  functions. 
A  weak  learner  for  J7  with  parameters  (t.  e,  5)  is  a  polynomial  time  algorithm  A  such  that  for 
any  function  f  €  Tn  and  for  any  distribution  V  over  the  inputs  to  f ,  the  following  holds.  Let 
x\, . . . ,  xt+i  <—  T>,  and  let  h  (“the  hypothesis”)  be  the  output  of  A(ln,  (xi,  f(x±)), . . . ,  (xt,  f{xt)))- 
Then 

Pr  Pr  [h(xt+ 1)  /  f(xt+ 1)]  >  e  <8  . 

Xl,...,xt  [xt+l 

We  say  that  J7  is  weakly  learnable  if  there  exists  a  weak  learner  for  J7  with  parameters  t  = 
poly(n),  e  <  1/2  —  l/poly(n),  5  <  1  —  l/poly(n).  (We  also  require  that  the  output  hypothesis  h  is 
polynomial  time  computable.) 

We  next  define  our  notion  of  ( t ,  r/)-sc-learning,  which  essentially  corresponds  to  the  ability  to 
launch  a  t-query  CPA  attack  on  an  (errorless)  encryption  scheme,  and  succeed  with  probability  g. 
(The  initials  “sc”  stand  for  “single  challenge” ,  reflecting  the  fact  that  a  CPA  attacker  only  receives 
a  single  challenge  ciphertext.) 

Definition  2.5  (sc-learning).  Let  J7  =  { T'nlneN  be  an  ensemble  of  functions.  A  (t,g)-sc-learner  for 
T  is  a  polynomial  time  algorithm  A  such  that  for  any  function  f  £  Tn  and  for  any  distribution  V 
over  the  inputs  to  f ,  the  following  holds.  Let  xi, . . .  ,xt+ 1  £-  V,  and  let  h  (“the  hypothesis”)  be  the 
output  of  A( ln,  (xi,  f(x i)), . . . ,  (xt,  f(xt))).  Then  Pr[/i(.Ti+i)  /  f{xt+ 1)]  <  f],  where  the  probability 
is  taken  over  the  entire  experiment. 

We  say  that  J7  is  ( t,r])-sc-leamable  it  has  a  polynomial  time  (t,rj)-sc-learner  for  it.  We  say 
that  a  binary  J7  is  sc-learnable  if  t  =  poly(n)  and  rj  <  1/2  —  l/poly(n).  (We  also  require  that  the 
output  hypothesis  h  is  polynomial  time  computable.) 

Since  sc-learning  only  involves  one  challenge,  we  do  not  define  the  “confidence”  and  “accuracy” 
parameters  (5,  e)  separately  as  in  the  definition  of  weak-learning. 

We  note  that  both  definitions  allow  for  improper  learning  (namely,  the  hypothesis  h  does  not 
need  to  “look  like”  an  actual  decryption  function). 

Equivalence  Between  Notions.  The  equivalence  of  the  two  notions  is  fairly  straightforward. 
Applying  boosting  [Sch90]  shows  that  sc-learning,  like  weak-learning,  can  be  amplified. 

Claim  2.2.  If  J7  is  sc-learnable  then  it  is  weak-learnable. 

Proof.  This  follows  by  a  Markov  argument:  Consider  a  ( t ,  ?/)-sc-learner  for  J7  (recall  that  r\  < 
1/2  —  l/poly(n))  and  let  5  =  1  —  l/poly(n)  be  such  that  rj/5  <  1/2  —  poly(n)  (such  6  must  exist). 
Then  letting  e  =  g/8  finishes  the  argument.  □ 

The  opposite  direction  will  give  us  very  strong  amplification  of  learning  by  applying  boosting. 

Claim  2.3.  If  J7  is  weak-learnable  then  it  is  (poly(n,  1  if) -sc-learnable  for  all  g. 

Proof.  Let  J7  be  weak-learnable.  Then  by  boosting  [Sch90]  it  is  also  PAC  learnable  [Val84],  Namely 
there  is  a  learner  with  parameters  (poly(n,  1/e,  1/(5),  e,  5)  for  any  inversely  polynomial  e,  5.  Setting 
e  =  8  =  r)/2,  the  claim  follows.  □ 
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Learning  Linear  Functions.  The  following  corollary  (of  Lemma  2.1)  shows  an  sc-learner  for 
the  class  of  linear  functions.2 

Corollary  2.4.  Let  Tn  be  a  class  of  n- dimensional  linear  functions  over  a  field  ¥ .  Then  F  =  {J~n\n 
is  (lOn,  1/ 10) -sc-leamable. 

Proof.  The  learner  A  is  given  t  =  lOn  samples  v,;  =  (x,;,  /(x,j))  E  IF”+1.  Using  Gaussian  elimination, 
A  will  find  s  E  IF”  such  that  (— s,  1)  E  Ker{vj}ig[t]  (note  that  such  must  exist).  Finally  A  will  output 
the  hypothesis  /;,(x)  =  (s,x). 

Correctness  follows  using  Lemma  2.1.  We  let  the  distribution  S  be  the  distribution  (x, /(x)) 
where  x  -e-  T>,  and  let  k  =  t+1.  Note  that  for  any  linear  function  /  :  IF”  — >•  IF,  the  set  {(x,  /(x))}xgFn 
is  an  n-dimensional  linear  subspace  of  IF”+1. 

Therefore,  with  probability  1  —  1/10,  it  holds  that  (xi+i,  f(xt+i))  E  Span{vj}ig[t]  which  implies 
that  ((-s,  1),  (xi+i,/(xt+i)))  =  0,  or  in  other  words  /(xm)  =  (s,xt+i)  =  h(xm).  □ 

3  Homomorphism  is  a  Liability  When  Decryption  is  Learnable 

This  section  features  our  main  result.  We  show  that  schemes  with  learnable  decryption  circuits  are 
very  limited  in  terms  of  their  homomorphic  properties,  regardless  of  decryption  error.  This  extends 
the  previous  results  of  [KV94,  KS09]  showing  that  the  decryption  function  cannot  be  learnable  if 
the  decryption  error  is  negligible. 

We  start  by  showing  that  a  scheme  with  ( t ,  l/10)-sc-learnable  decryption  function  (i.e.  efficient 
learning  with  probability  1/10  using  t  samples,  see  Definition  2.5)  cannot  have  decryption  error 
smaller  than  P(l/f)  and  be  secure  (regardless  of  homomorphism).  We  proceed  to  show  that  if  the 
scheme  can  homomorphically  evaluate  the  majority  function,  then  the  above  amplifies  dramatically 
and  security  cannot  be  guaranteed  for  any  reasonable  decryption  error  (1/2  —  e  error  for  any 
noticeable  e).  Using  Claim  2.3  (boosting),  this  implies  that  the  above  hold  for  any  scheme  with 
weakly-learnable  (or  sc-learnable)  decryption.  We  then  discuss  the  role  of  key  generation  error 
compared  to  encryption  error. 

For  the  sake  of  simplicity,  we  focus  on  the  public  key  setting.  However,  our  proofs  easily 
extend  also  to  symmetric  encryption,  since  our  attacks  only  use  the  public  key  in  order  to  generate 
ciphertexts  for  known  messages. 

Learnable  Decryption  without  Homomorphism.  We  start  by  showing  that  a  scheme  whose 
decryption  circuit  is  (f,  l/10)-sc-learnable  has  to  have  decryption  error  e  =  U(l/t),  otherwise  it  is 
insecure.  This  is  a  parameterized  and  slightly  generalized  version  of  the  claims  of  [KV94,  KS09], 
geared  towards  schemes  with  high  decryption  error  and  possibly  bad  keys.  The  basic  idea  is 
straightforward:  We  use  the  public  key  to  generate  t  ciphertexts  to  be  used  as  labeled  samples  for 
our  learner,  and  then  use  its  output  hypothesis  to  decrypt  the  challenge  ciphertext.  The  above 
succeeds  so  long  as  all  samples  in  the  experiment  decrypt  correctly,  which  by  the  union  bound  is 
at  least  1  —  t  ■  e.  A  formal  statement  and  proof  follows. 

Lemma  3.1.  An  encryption  scheme  whose  decryption  function  is  ( t ,  1/10)  -sc-learnable  for  a  poly¬ 
nomial  t  and  whose  decryption  error  <  l/(10(f  +  1))  is  insecure. 

2The  learner  works  even  when  the  function  class  is  not  binary,  which  is  only  an  advantage.  The  binary  case  follows 
by  considering  distributions  supported  only  over  the  pre-images  of  0, 1. 
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Proof.  Consider  a  key  pair  ( pk ,  sk )  for  the  scheme,  and  consider  the  following  CPA  adversary. 
The  adversary  first  generates  t  labeled  samples  of  the  form  (Enc pk(rn),m),  for  random  messages 
m  4—  {0, 1}  (where  0, 1  serve  as  generic  names  for  an  arbitrary  pair  of  elements  in  the  scheme’s 
message  space).  These  samples  are  fed  into  the  aforementioned  learner,  let  h  denote  the  learner’s 
output  hypothesis.  The  adversary  lets  mo  =  0,  mi  =  1,  and  given  the  challenge  ciphertext 
c  =  Enc pk(mb),  it  outputs  b'  =  h(c). 

To  analyze,  we  define  V  to  be  the  (inefficient)  distribution  c  =  Encpi,(m)|(Decsfc(c)  =  m),  for 
a  randomly  chosen  m  -4-  {0, 1}.  Namely,  the  distribution  T>  first  samples  7n  A  {0, 1},  and  then 
outputs  a  random  correctly  decryptable  encryption  of  m.  By  Definition  2.5,  if  the  learner  gets  t 
samples  from  this  distribution,  it  outputs  a  hypothesis  that  correctly  labels  the  (t  + 1)  sample,  with 
all  but  1/10  probability. 

While  we  cannot  efficiently  sample  from  V  (without  the  secret  key),  we  show  that  the  samples 
(and  challenge)  that  we  feed  to  our  learner  are  in  fact  statistically  close  to  samples  from  T>.  Consider 
a  case  where  ( pk,sk )  are  such  that  the  decryption  error  is  indeed  smaller  than  e  =  l/(10(f  +  1)). 
In  such  case,  our  adversary  samples  from  a  distribution  of  statistical  distance  at  most  e  from  T>, 
and  the  challenge  ciphertext  is  drawn  from  the  same  distribution.  It  follows  that  the  set  of  ( t  +  1) 
samples  that  we  consider  during  the  experiment  (containing  the  labeled  samples  and  the  challenge) , 
agree  with  V  with  all  but  (t  +  1)  •  e  =  1/10  probability. 

Using  the  union  bound  on  all  aforementioned  “bad”  events  (the  key  pair  not  conforming  with 
decryption  error  as  per  Definition  2.1,  the  samples  not  agreeing  with  T>,  and  the  learner  failing), 
we  get  that  Pr[6'  =  b]  >  1  —  0.01  —  1/10  —  1/10  >  0.7  and  the  lemma  follows.  □ 

Using  Claim  2.3,  we  derive  the  following  corollary. 

Corollary  3.2.  An  encryption  scheme  whose  decryption  function  is  weakly-learnable  must  have 
decryption  error  1/poly  (ro)  for  some  polynomial. 

We  note  that  this  corollary  does  not  immediately  follow  from  [KV94,  KS09]  if  a  noticeable 
fraction  of  the  keys  can  be  “bad”  (since  they  do  not  use  boosting). 

Plugging  our  learner  for  linear  functions  (Corollary  2.4)  into  Lemma  3.1  implies  the  following, 
which  will  be  useful  for  the  next  section. 

Corollary  3.3.  There  exists  a  constant  a  >  0  such  that  any  n-linearly  decryptable  scheme  with 
decryption  error  <  a/n  is  insecure. 

Learnable  Decryption  with  Majority  Homomorphism.  Lemma  3.1  and  Corollary  3.2  by 
themselves  are  not  very  restrictive.  Specifically,  they  are  not  directly  applicable  to  attacking  any 
known  scheme.  Indeed,  known  schemes  with  linear  decryption  (e.g.  LPN  based)  have  sufficiently 
high  decryption  error  (or,  viewed  differently,  adding  the  error  makes  the  underlying  decryption 
hard  to  learn).  We  now  show  that  if  homomorphism  is  required  as  a  property  of  the  scheme,  then 
decryption  error  cannot  save  us. 

The  following  theorem  states  that  majority-homomorphic  schemes  (see  Definition  2.2)  cannot 
have  learnable  decryption  for  any  reasonable  decryption  error. 

Theorem  3.4.  An  encryption  scheme  whose  decryption  circuit  is  ( t ,  1/10 )-sc-learnable  for  a  poly¬ 
nomial  t  and  whose  decryption  error  <  (1/2  —  e)  cannot  be  Oflogt / e2) -majority-homomorphic. 
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Let  us  first  outline  the  proof  of  Theorem  3.4  before  formalizing  it.  Our  goal  is  the  same  as  in  the 
proof  of  Lemma  3.1,  to  generate  t  labeled  samples,  which  will  enable  to  break  security.  However, 
unlike  above,  taking  t  random  encryptions  will  surely  introduce  decryption  errors.  We  thus  use 
the  majority  homomorphism:  We  generate  a  good  encryption  of  m,  i.e.  one  that  is  decryptable 
with  high  probability,  by  generating  0(logf/e2)  random  encryptions  of  m,  and  apply  majority 
homomorphically.  Chernoff’s  bound  guarantees  that  with  high  probability,  more  than  half  of  the 
ciphertexts  are  properly  decryptable,  and  therefore  the  output  of  the  majority  evaluation  is  with 
high  probability  a  decryptable  encryption  of  m.  At  this  point,  we  can  apply  the  same  argument  as 
in  the  proof  of  Lemma  3.1.  The  formal  proof  follows. 

Proof.  Consider  an  encryption  scheme  (Gen,  Enc,  Dec)  as  in  the  theorem  statement.  We  will  con¬ 
struct  a  new  scheme  (Gen7  =  Gen,  Enc/,  Dec7  =  Dec)  (with  the  same  key  generation  and  decryption 
algorithms)  whose  security  relates  to  that  of  (Gen,  Enc,  Dec).  Then  we  will  use  Lemma  3.1  to  render 
the  latter  scheme  insecure. 

The  new  encryption  algorithm  Er\c'pk(m)  works  as  follows:  To  encrypt  a  message  m,  invoke 
the  original  encryption  Enc pfc(m)  for  (say)  k  =  10(ln(f  +  1)  +  ln(10))/e2  times,  thus  generating  k 
ciphertexts.  Apply  MajEval  to  those  k  ciphertexts  and  output  the  resulting  ciphertext. 

The  security  of  the  new  scheme  is  related  to  that  of  the  original  by  a  straightforward  hybrid 
argument.  We  will  show  that  the  new  scheme  has  decryption  error  at  most  l/(10(t  +  1)),  but  in  a 
slightly  weaker  sense  then  Definition  2.1:  We  will  allow  2%  of  the  keys  to  be  “bad”  instead  of  just 
1%  as  before.  One  can  easily  verify  that  the  proof  of  Lemma  3.1  works  in  this  case  as  well. 

Our  set  of  good  key  pairs  for  Enc7  is  those  for  which  Decsfc(Encpfc(-))  indeed  have  decryption 
error  at  most  1/2  —  e  and  in  addition  MajEval  is  correct.  By  the  union  bound  this  happens  with 
probability  at  least  0.98. 

To  bound  the  decryption  error  of  Decsfc(Enc/fc(-)),  assume  that  we  have  a  good  key  pair  as 
described  above.  We  will  bound  the  probability  that  more  than  a  1/2  —  e/2  fraction  of  the  k 
ciphertexts  generated  by  Enc7  are  decrypted  incorrectly.  Clearly  if  this  bad  event  does  not  happen, 
then  by  the  correctness  of  MajEval,  the  resulting  ciphertext  will  decrypt  correctly. 

Recalling  that  the  expected  fraction  of  falsely  decrypted  ciphertexts  is  at  most  1/2  —  e,  the 
Chernoff  bound  implies  that  the  aforementioned  bad  event  happens  with  probability  at  most 

e-2(e/2fk  <  !/(l0(t  +  l)) 


and  the  theorem  follows.  □ 

From  the  proof  it  is  obvious  that  even  “approximate- majority  homomorphism”  is  sufficient  for 
the  theorem  to  hold.  Namely,  even  if  MajEval  only  computes  the  majority  function  correctly  if  the 
fraction  of  identical  inputs  is  more  than  1/2  +  e/2. 

We  can  derive  a  general  corollary  for  every  weakly-learnable  function  using  Claim  2.3.  This 
applies,  for  example,  to  linear  functions,  low  degree  polynomials  and  shallow  circuits. 

Corollary  3.5.  An  encryption  scheme  whose  decryption  function  is  weakly-learnable  and  whose 
decryption  error  is  1/2  —  e  cannot  be  cv(logn/ e2) -majority-homomorphic. 

The  Role  of  Bad  Keys.  Recall  that  in  Definitions  2.1  and  2.2  (decryption  error  and  majority 
homomorphism)  we  allowed  a  constant  fraction  of  keys  to  be  useless  for  the  purpose  of  decryption 
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and  homomorphic  evaluation,  respectively.  In  fact,  it  is  this  relaxation  that  makes  our  argument 
more  involved  than  [KV94,  KS09]. 

As  we  mentioned  above,  the  choice  of  constant  0.01  is  arbitrary.  Let  us  now  explain  how  our 
results  extend  to  the  case  of  1/2  —  k  fraction  of  bad  keys,  where  k  =  l/poly(n)  (we  now  count  the 
keys  that  are  either  bad  for  decryption  or  bad  for  homomorphism).  In  such  case,  the  argument  of 
Lemma  3.1  will  work  so  long  as  we  start  with  a  (t,  ?/)-sc-learner  with  <  n/ 3  and  so  long  as  the 
decryption  error  for  good  keys  is  at  most  k/ (3(t+l)).  If  the  scheme  is  furthermore  0(log(i/ft)/e2)  = 
0(logn/e2)-majority-homomorphic,  the  proof  of  Theorem  3.4  will  also  go  through.  Finally,  using 
boosting,  we  can  start  with  any  weak  learner  and  reduce  r/  to  <  k/3  at  the  cost  of  a  polynomial 
increase  in  t,  which  is  tolerable  by  our  arguments  (and  swallowed  by  the  asymptotic  notation). 

4  Attacks  on  the  BL  Scheme 

In  this  section  we  use  our  tools  from  above  to  show  that  the  BL  scheme  (outlined  in  Section  4.1 
below)  is  broken.  We  present  two  attacks:  the  first,  in  Section  4.2,  follows  from  Corollary  3.3  (and 
works  in  the  spirit  of  Theorem  3.4);  and  the  second,  in  Section  4.3,  directly  attacks  a  lower  level 
subcomponent  of  the  scheme  and  allows  to  decrypt  any  ciphertext.  In  fact,  the  latter  attack  also 
follows  the  same  basic  principles  and  exploits  a  “built-in”  evaluation  of  majority  that  exists  in  that 
sub-component  of  BL. 

4.1  Essentials  of  the  BL  Scheme 

In  this  section  we  present  the  properties  of  the  BL  scheme.  We  concentrate  on  the  properties  that 
are  required  for  our  breaks.  We  refer  the  reader  to  [BL11]  for  further  details. 

The  BL  scheme  has  a  number  of  layers  of  abstraction,  which  are  all  instantiated  based  on  a 
global  parameter  0  <  a  <  0.25  as  explained  below. 

The  Scheme  K q(n).  BL  introduce  K q(n),  a  public- key  encryption  scheme  with  imperfect  cor¬ 
rectness.  For  security  parameter  n,  the  public  key  is  a  matrix  P  £  FgXr,  where  r  =  n1-a/8,  and  the 
secret  key  is  a  vector  y  £  IF”  in  the  kernel  of  P3  (namely,  y3  •  P  =  0) .  The  keys  are  generated  in 
a  highly  structured  manner  in  order  to  support  homomorphism,  but  their  structure  is  irrelevant  to 
us.  An  encryption  of  a  message  rn  £  Fg  is  a  vector  c  =  P  •  x  +  m  ■  1  +  e,  where  x  £  F”  is  some  vector, 
and  where  e  £  F”  is  a  low  hamming  weight  vector.  Decryption  is  performed  by  taking  the  inner 
product  (y,  c),  and  succeeds  so  long  as  (y,  e)  =  0  (the  vector  y  is  chosen  such  that  (y,  1)  =  1).  It 
is  shown  how  the  structure  of  the  keys  implies  that  decryption  succeeds  with  probability  at  least 
(l  —  n-d-a/2)j.  Finally,  BL  show  that  Ky  (n)  is  homomorphic  with  respect  to  a  single  addition  or 
multiplication. 3 

Re-Encryption.  In  order  to  enable  homomorphism,  BL  introduce  the  notion  of  re-encryption. 
Consider  an  instantiation  of  K q(n),  with  keys  (P,y),  and  an  instantiation  of  Kg(r/)  with  keys 
(P',y'),  for  n'  =  nl+a.  Let  H ni:n  £  F”  xn  be  an  element-wise  encryption  of  y  using  the  public  key 

3Homomorphic  operations  (addition,  multiplication)  are  performed  element-wise  on  ciphertext  vectors,  and  the 
structure  of  the  key  guarantees  that  correctness  is  preserved. 
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P'.4  Namely  Hn/:n  =  P'  •  X'  +  1  •  y7  +  E'.  Due  to  the  size  difference  between  the  schemes,  it  holds 
that  with  probability  (l  —  all  of  the  columns  of  Hn/;n  are  simultaneously  decryptable  and 

indeed  y/T  •  Hn/:n  =  y7  .  In  such  case,  for  any  ciphertext  c  of  K q(n),  we  get  (yh  H„/.nc)  =  (y,c). 
The  matrix  Hn/:n  therefore  re-encrypts  ciphertexts  of  K q(ji)  as  ciphertexts  of  K q{n'). 

The  critical  idea  for  our  second  break  is  that  a  re-encrypted  ciphertext  always  belongs  to  an 
n-dimensional  linear  subspace  (recall  that  n  <C  n'),  namely  to  the  span  of  H n>:n. 

The  Scheme  BASIC.  Using  re-encryption,  BL  construct  a  ladder  of  schemes  of  increasing 
lengths  that  allow  for  homomorphic  evaluation.  They  define  the  scheme  BASIC  which  has  an 
additional  depth  parameter  d  =  0(1)  (BL  suggest  to  use  d  =  8,  but  our  attack  works  for  any 
d  >  1).  They  consider  instantiations  of  K q(rii),  where  nt  =  (d  ° ,  for  i  =  0, . . . ,  d,  so  rid  =  n. 

They  generate  all  re-encryption  matrices  Hni+i:nj  (with  success  probability  (l  —  n  -QW))  and  can 
thus  homomorphically  evaluate  depth  d  circuits. 

The  homomorphic  evaluation  works  by  performing  a  homomorphic  operation  at  level  i  of  the 
evaluated  circuit  (with  i  going  from  0  to  d  —  1),  and  then  using  re-encryption  with  Hni+i:ni  to 
obtain  a  fresh  ciphertext  for  the  next  level. 

For  the  purposes  of  our  (second)  break,  we  notice  that  in  the  last  step  of  this  evaluation  is 
re-encryption  using  H„d:nd_1.  This  means  that  homomorphically  evaluated  ciphertexts  all  come 
from  a  linear  subspace  of  dimension  rid-  l  =  n1^1+a\ 

Error  Correction  and  the  Matrix  Hn:n.  The  scheme  BASIC  only  allows  homomorphism  at 
the  expense  of  increasing  the  instance  size  (namely  n).  BL  show  next  that  it  is  possible  to  use 
BASIC  to  generate  a  re-encryption  matrix  without  a  size  increase. 

They  generate  an  instance  of  BASIC,  with  public  keys  Po,  • . . ,  Pfj,  secret  key  y^  =  y*,  and 
re-encryption  matrices  Hnj+i:jli.  An  additional  independent  instance  of  K q(n)  is  generated,  whose 
keys  we  denote  by  (P,y).  Then,  a  large  number  of  encryptions  of  the  elements  of  y  under  public 
key  Po  are  generated.5  While  some  of  these  ciphertexts  may  have  encryption  error,  BL  show  that 
homomorphically  evaluating  a  depth-d  correction  circuit  ( CORR  in  their  notation),  one  can  obtain 
a  matrix  Hn:n,  whose  columns  are  encryptions  of  y*  that  are  decryptable  under  y  without  error. 
This  process  succeeds  with  probability  (1  —  rC^1)). 

The  resemblance  to  the  learner  of  Corollary  2.4  is  apparent.  In  a  sense,  the  public  key  of 
BASIC  is  ready-for-use  learner. 

To  conclude  this  part,  BL  generate  a  re-encryption  matrix  Hn:n  that  takes  ciphertexts  under  y 
and  produces  ciphertexts  under  y*.  Since  H n:n  is  produced  using  homomorphic  evaluation,  its  rank 
is  at  most  rid- 1  =  n1^1+a\  We  will  capitalize  on  the  fact  that  re-encryption  using  Hn;n  produces 
ciphertexts  that  all  reside  in  a  low-dimensional  space. 

Achieving  Full  Homomorphism  The  Scheme  HOM.  The  basic  idea  is  to  generate  a 
sequence  of  matrices  Hn:n,  thus  creating  a  chaining  of  the  respective  secret  keys  that  will  allow 
homomorphism  of  any  depth.  However,  generating  an  arbitrarily  large  number  of  such  re-encryption 
matrices  will  eventually  cause  an  error  somewhere  down  the  line.  Therefore,  a  more  sophisticated 

4 A  note  on  notation:  In  [BL11],  the  re-encryption  parameters  are  denoted  by  I  (as  opposed  to  our  H).  We  feel 
that  their  notation  ignores  the  important  linear  algebraic  structure  of  the  re-encryption  parameters,  and  therefore 
we  switched  to  matrix  notation,  which  also  dictated  the  change  of  letter. 

sTo  be  absolutely  precise,  BL  encrypt  a  bit  decomposition  of  y*,  but  this  is  immaterial  to  us. 
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solution  is  required.  BL  suggest  to  encrypt  each  message  a  large  number  of  times,  and  generate  a 
large  number  of  re-encryption  matrices  per  level.  Then,  since  the  vast  majority  of  matrices  per  level 
are  guaranteed  to  be  correct,  one  can  use  shallow  approximate  majority  computation  to  guarantee 
that  the  fraction  of  erroneous  ciphertexts  per  level  does  not  increase  with  homomorphic  evaluation. 

Decryption  is  performed  as  follows:  Each  ciphertext  is  a  set  of  ciphertexts  ci, . . . ,  Ck  of  K q(n) 
(all  with  the  same  secret  key).  The  decryption  process  first  uses  the  K q(n)  key  to  decrypt  the 
individual  ciphertexts  and  obtain  mi, . . .  ,  m*,,  and  then  outputs  the  majority  between  the  values 
rrii .  BL  show  that  a  majority  of  the  ciphertexts  (say  more  than  15/16  fraction)  are  indeed  correct, 
which  guarantees  correct  decryption. 

BL  can  thus  achieve  a  (leveled)  fully  homomorphic  scheme  which  they  denote  by  HOM,  which 
completes  their  construction. 

4.2  An  Attack  on  BL  Using  Homomorphism 

We  will  show  how  to  break  the  BL  scheme  using  its  homomorphic  properties.  We  use  Corollary  3.3 
and  our  proof  contains  similar  elements  to  the  proof  of  Theorem  3.4.  (The  specifics  of  BL  do  not 
allow  to  use  Corollary  3.5  directly.) 

Theorem  4.1.  There  is  a  polynomial  time  CPA  attack  on  BL. 

Proof.  Clearly  we  cannot  apply  our  methods  to  the  scheme  HOM  as  is,  since  its  decryption  is  not 
learnable.  We  thus  describe  a  related  scheme  which  is  “embedded”  in  HOM  and  show  how  to 
distinguish  encryptions  of  0  from  encryptions  of  1,  which  will  imply  a  break  of  HOM. 

We  recall  that  the  public  key  of  HOM  contains  “chains”  of  re-encryption  matrices  of  the  form 
Hn:„.  The  length  of  the  chains  is  related  to  the  homomorphic  depth  of  HOM.  Our  sub-scheme  will 
only  require  a  chain  of  constant  length  £  which  will  be  determined  later  (such  sub-chain  therefore 
must  exist  for  any  instantiation  of  BL  that  allows  for  more  than  constant  depth  homomorphism). 
Granted  that  all  links  in  the  chain  are  successfully  generated  (which  happens  with  probability 
£  ■  tW^1)),  such  a  chain  allows  homomorphic  evaluation  of  any  depthT  function.  Let  us  focus  on 
the  case  where  the  chain  is  indeed  properly  generated. 

Intuitively,  we  would  have  liked  to  use  this  structure  to  evaluate  majority  on  2e  input  ciphertexts. 
However,  BL  is  defined  over  a  large  held  F,  and  it  is  not  clear  how  to  implement  majority  over  F  in 
depth  that  does  not  depend  on  q  =  |F|.  To  solve  this  problem,  we  use  BL’s  CORR  function.  This 
function  is  just  a  NAND  tree  of  depth  £  (extended  to  F  in  the  obvious  way:  N AND(x,  y)  =  1  —  xy ). 
BL  show  that  given  2^  inputs,  each  of  which  is  0  (respectively  1)  with  probability  1  —  e,  the  output 
of  CORR  will  be  0  (resp.  1)  with  probability  1  —  0(e)2t/2. 

To  encrypt  a  message  m  €  {0, 1}  using  our  sub-scheme,  we  will  generate  2(  ciphertexts.  Each 
ciphertext  will  be  an  independent  encryption  of  m  using  the  public  key  of  HOM  (which  essentially 
generates  K q(n)  ciphertexts  that  correspond  to  the  first  link  in  all  chains).  We  then  apply  CORR 
homomorphically  to  the  generated  ciphertexts.  Decryption  in  our  subscheme  will  be  standard 
K9  (n)  decryption  (which  is  a  linear  function)  using  the  secret  key  that  corresponds  to  the  last  link 
in  the  chain.6 

We  recall  that  the  decryption  error  of  K q{n)  is  e  =  By  the  properties  of  CORR ,  we 

can  choose  £  =  0(1)  such  that  the  decryption  error  of  our  sub-scheme  is  at  most  (say)  o(l)/n. 

6The  secret  key  of  the  last  link  is  not  the  same  as  the  secret  key  of  HOM,  since  we  are  only  considering  a  sub-chain 
of  a  much  longer  chain.  However,  this  is  not  a  problem:  Our  arguments  do  not  require  that  the  secret  key  is  known 
to  anyone  in  order  to  break  the  scheme. 
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In  conclusion,  we  get  a  sub-scheme  of  HOM  such  that  with  probability  1  —  >  0.9  over 

the  key  generation,  the  decryption  error  is  at  most  o(l)/n.  Furthermore,  decryption  is  linear. 
Corollary  3.3  implies  that  such  scheme  must  be  insecure.  □ 

4.3  A  Specific  Attack  on  BL 

We  noticed  that  the  scheme  BASIC,  which  is  a  component  of  HOM,  contains  by  design  homo¬ 
morphic  evaluation  of  majority:  this  is  how  the  matrix  Hn:n  is  generated.  We  thus  present  an 
attack  that  only  uses  the  matrix  Hn:n  and  allows  to  completely  decrypt  BL  ciphertexts  (even  non 
binary)  with  probability  1  —  .  We  recall  that  an  attack  completely  breaks  a  scheme  if  it  can 

decrypt  any  given  ciphertext  with  probability  1  —  o(l). 

Theorem  4.2.  There  exists  a  polynomial  time  attack  that  completely  breaks  BASIC,  and  thus 
also  BL. 

Proof.  We  consider  the  re-encryption  matrix  H  =  H n:n  €  F”xn  described  in  Section  4.1,  which 
re-encrypts  ciphertexts  under  y  into  ciphertexts  under  y * .  The  probability  that  H  was  successfully 
generated  is  at  least  1  —  in  which  case  it  holds  that 

y*T-H  =  yT. 

In  addition,  as  we  explained  in  Section  4.1,  the  rank  of  H  is  at  most  h  = 

Our  breaker  will  be  given  H  and  the  public  key  P  that  corresponds  to  y,  and  will  be  able  to 
decrypt  any  vector  c  =  Encp(m)  with  high  probability,  namely  compute  (y,c). 

Breaker  Code.  As  explained  above,  the  input  to  the  breaker  is  H,  P  and  challenge  c  =  Encp(m). 
The  breaker  will  execute  as  follows: 

1.  Generate  k  =  h1+e  encryptions  of  0,  denoted  vi, . . . ,  v*.,  for  e  =  —  (any  positive  number 

smaller  than  will  do). 

Note  that  this  means  that  with  probability  1  —  all  Vi  are  decryptable  encryptions  of  0. 

Intuitively,  these  vectors,  once  projected  through  H,  will  span  all  decryptable  encryptions 
of  0. 

2.  For  all  i  =  1, . . . ,  k,  compute  v*  =  H  •  v,;  (the  projections  of  the  ciphertexts  above  through 
H).  Also  compute  o*  =  H  •  1  (the  projection  of  the  all-one  vector). 

3.  Find  a  vector  y*  €  IF”  such  that  (y*,  v*}  =  0  for  all  i,  and  such  that  (y*,o*)  =  1.  Such  a 
vector  necessarily  exists  if  all  v.;’s  are  decryptable,  since  y*  is  an  example  of  such  a  vector. 

4.  Given  a  challenge  ciphertext  c,  compute  c*  =  H  •  c  and  output  m  =  (y*,c*)  (namely, 
m  =  y*T  •  H  •  c). 

Correctness.  To  analyze  the  correctness  of  the  breaker,  we  first  notice  that  the  space  of  cipher- 
texts  that  decrypt  to  0  under  y  is  linear  (this  is  exactly  the  orthogonal  space  to  y).  We  denote 
this  space  by  Z.  Since  1  0  Z,  we  can  define  the  cosets  Zm  =  Z  +  m  ■  1.  We  note  that  all  legal 
encryptions  of  m  using  P  reside  in  Zm . 
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We  let  Z *  denote  the  space  H  •  Z  (all  vectors  of  the  form  H  •  z  such  that  z  €  Z).  This  is  a  linear 
space  with  dimension  at  most  h.  Similarly,  define  Z*x  =  Z*  +  m  ■  o* . 

Consider  the  challenge  ciphertext  c  =  Encp(m).  We  can  think  of  c  as  an  encryption  of  0 
with  an  added  term  m  ■  1.  We  therefore  denote  c  =  Co  +  m  ■  1.  Again  this  yields  a  Cq  such  that 
c*  =  c Q  +  m-  o*. 

Now  consider  the  distribution  Z  over  Z,  which  is  the  distribution  of  decryptable  encryptions 
of  0  (i.e.  the  distribution  c  =  Encp(O),  conditioned  on  (y,c)  =  0).  The  distribution  Z*  is  defined 
by  projecting  Z  through  H.  With  probability  (l  —  n  it  holds  that  v*,  . . . ,  v£,  and  Cq  are 

uniform  samples  from  Z* . 

By  Lemma  2.1  below,  it  holds  that  Cg  e  Spanjv*, . . . ,  v£},  with  probability  (l  —  n  -n(i))_  In 
such  case 

(y*,c*)  =  (y*,Co)  +  m-  (y*,o*)  =  m  . 

We  conclude  that  with  probability  1  —  our  breaker  correctly  decrypts  c  as  required.  □ 
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Abstract 

We  construct  the  first  Message  Authentication  Codes  (MACs)  that  are  existentially  unforge- 
able  against  a  quantum  chosen  message  attack.  These  chosen  message  attacks  model  a  quantum 
adversary’s  ability  to  obtain  the  MAC  on  a  superposition  of  messages  of  its  choice.  We  begin  by 
showing  that  a  quantum  secure  PRF  is  sufficient  for  constructing  a  quantum  secure  MAC,  a  fact 
that  is  considerably  harder  to  prove  than  its  classical  analogue.  Next,  we  show  that  a  variant  of 
Carter- Wegnran  MACs  can  be  proven  to  be  quantum  secure.  Unlike  the  classical  settings,  we 
present  an  attack  showing  that  a  pair-wise  independent  hash  family  is  insufficient  to  construct  a 
quantum  secure  one-time  MAC,  but  we  prove  that  a  four-wise  independent  family  is  sufficient 
for  one-time  security. 

Keywords :  Quantum  computing,  MAC, chosen  message  attacks,  post-quantum  security 


1  Introduction 

Message  Authentication  Codes  (MACs)  are  an  important  building  block  in  cryptography  used  to 
ensure  data  integrity.  A  MAC  system  is  said  to  be  secure  if  an  efficient  attacker  capable  of  mounting 
a  chosen  message  attack  cannot  produce  an  existential  MAC  forgery  (see  Section  2.2). 

With  the  advent  of  quantum  computing  there  is  a  strong  interest  in  post-quantum  cryptography, 
that  is  systems  that  remain  secure  even  when  the  adversary  has  access  to  a  quantum  computer. 
There  are  two  natural  approaches  to  defining  security  of  a  MAC  system  against  a  quantum  adversary. 
One  approach  is  to  restrict  the  adversary  to  issue  classical  chosen  message  queries,  but  then  allow 
the  adversary  to  perform  quantum  computations  between  queries.  Security  in  this  model  can  be 
achieved  by  basing  the  MAC  construction  on  a  quantum  intractable  problem. 

The  other  more  conservative  approach  to  defining  quantum  MAC  security  is  to  model  the  entire 
security  game  as  a  quantum  experiment  and  allow  the  adversary  to  issue  quantum  chosen  message 
queries.  That  is,  the  adversary  can  submit  a  superposition  of  messages  from  the  message  space  and 
in  response  receive  a  superposition  of  MAC  tags  on  those  messages.  Informally,  a  quantum  chosen 
message  query  performs  the  following  transformation  on  a  given  superposition  of  messages: 

XAm  \m)  — >  V’rn  I m,S(k,m)) 

m  m 

where  S(k,  m)  is  a  tag  on  the  message  m  with  secret  key  k. 

To  define  security,  let  q  be  the  number  of  queries  that  the  adversary  issues  by  the  end  of  the 
game.  Clearly  it  can  now  produce  q  classical  message-tag  pairs  by  sampling  the  q  superpositions 
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it  received  from  the  MAC  signing  oracle.  We  say  that  the  MAC  system  is  quantum  secure  if  the 
adversary  cannot  produce  q  +  1  valid  message-tag  pairs.  This  captures  the  fact  that  the  adversary 
cannot  do  any  better  than  trivially  sampling  the  responses  from  its  MAC  signing  oracle  and  is  the 
quantum  analogue  of  a  classical  existential  forgery. 

1.1  Our  results 

In  this  paper  we  construct  the  first  quantum  secure  MAC  systems.  We  begin  with  a  definition 
of  quantum  secure  MACs  and  give  an  example  of  a  MAC  system  that  is  secure  against  quantum 
adversaries  capable  of  classical  chosen  message  queries,  but  is  insecure  when  the  adversary  can  issue 
quantum  chosen  message  queries.  We  then  present  a  number  of  quantum  secure  MAC  systems. 

Quantum  secure  MACs.  In  the  classical  settings  many  MAC  systems  are  based  on  the  observa¬ 
tion  that  a  secure  pseudorandom  function  gives  rise  to  a  secure  MAC  [BKROO,  BCK96].  We  begin 
by  studying  the  same  question  in  the  quantum  settings.  Very  recently  Zhandry  [Zhal2b]  defined 
the  concept  of  a  quantum  secure  pseudorandom  function  (PRF)  which  is  a  PRF  that  remains 
indistinguishable  from  a  random  function  even  when  the  adversary  can  issue  quantum  queries  to 
the  PRF.  He  showed  that  the  classic  GGM  construction  [GGM86]  remains  secure  under  quantum 
queries  assuming  the  underlying  pseudorandom  generator  is  quantum  secure. 

The  first  question  we  study  is  whether  a  quantum  secure  PRF  gives  rise  to  a  quantum  secure 
MAC,  as  in  the  classical  settings.  To  the  MAC  adversary  a  quantum  secure  PRF  is  indistinguishable 
from  a  random  function.  Therefore  proving  that  the  MAC  is  secure  amounts  to  proving  that  with  q 
quantum  queries  to  a  random  oracle  H  no  adversary  can  produce  q  +  1  input-output  pairs  of  H 
with  non-negligible  probability.  In  the  classical  settings  where  the  adversary  can  only  issue  classical 
queries  to  H  this  is  trivial:  given  q  evaluations  of  a  random  function,  the  adversary  learns  nothing 
about  the  value  of  the  function  at  other  points.  Unfortunately,  this  argument  fails  under  quantum 
queries  because  the  response  to  a  single  quantum  query  to  H  :  X  — >  y  contains  information  about  all 
of  H.  In  fact,  with  a  single  quantum  query  the  adversary  can  produce  two  input-output  pairs  of  H 
with  probability  about  2/\y\  (with  classical  queries  the  best  possible  is  1/|V|)-  As  a  result,  proving 
that  q  quantum  queries  are  insufficient  to  produce  q  +  1  input-output  pairs  is  quite  challenging.  We 
prove  tight  upper  and  lower  bounds  on  this  question  by  proving  the  following  theorem: 

Theorem  1.1  (informal).  Let  H  :  X  — >  y  be  a  random  oracle.  Then  an  adversary  making  at 
most  q  <  \X\  quantum  queries  to  H  will  produce  q  +  1  input-output  pairs  of  H  with  probability  at 
most  ( q  +  l)/\y\.  Furthermore,  when  q  <C  |V|  there  is  an  algorithm  that  with  q  quantum  queries 
to  H  will  produce  q  +  1  input-output  pairs  of  H  with  probability  1  —  (1  —  l/\y\)q+1  «  (q  +  1)/|V|- 

The  first  part  of  the  theorem  is  the  crucial  fact  needed  to  build  quantum  secure  MACs  and  is  the 
harder  part  to  prove.  It  shows  that  when  |V|  is  large  any  algorithm  has  only  a  negligible  chance  in 
producing  q  +  1  input-output  pairs  of  H  from  q  quantum  queries.  To  prove  this  bound  we  introduce 
a  new  lower-bound  technique  we  call  the  rank  method  for  bounding  the  success  probability  of 
algorithms  that  succeed  with  only  small  probability.  Existing  quantum  lower  bound  techniques  such 
as  the  polynomial  method  [BBC+01]  and  the  adversary  method  [ArnbOO,  Aar02,  Amb06,  ASdW09] 
do  not  give  the  result  we  need.  One  difficulty  with  existing  lower  bound  techniques  is  that  they 
generally  prove  asymptotic  bounds  on  the  number  of  queries  required  to  solve  a  problem  with  high 
probability,  whereas  we  need  a  bound  on  the  success  probability  of  an  algorithm  making  a  limited 
number  of  queries.  Attempting  to  apply  existing  techniques  to  our  problem  at  best  only  bounds 
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the  success  probability  away  from  1  by  an  inverse  polynomial  factor,  which  is  insufficient  for  our 
purposes.  The  rank  method  for  proving  quantum  lower  bounds  overcomes  these  difficulties  and  is  a 
general  tool  that  can  be  used  in  other  post-quantum  security  proofs. 

The  second  part  of  Theorem  1.1  shows  that  the  lower  bound  presented  in  the  first  part  of  the 
theorem  is  tight.  A  related  algorithm  was  previously  presented  by  van  Dam  [vD98],  but  only  for 
oracles  outputting  one  bit,  namely  when  y  =  {0, 1}.  For  such  a  small  range  only  about  |Aj/2 
quantum  queries  are  needed  to  learn  the  oracle  at  all  | X\  points.  A  special  case  where  y  =  X  =  {0, 1} 
and  q  =  1  was  developed  independently  by  Kerenidis  and  de  Wolf  [KdW03].  Our  algorithm  is  a 
generalization  of  van  Dam’s  result  to  multi-bit  oracles. 

Quantum  secure  Carter- Wegman  MACs.  A  Carter- Wegman  MAC  [WC81]  signs  a  message  rri 
by  computing  (r,  h(m)  ©  F(k,r ))  where  h  is  a  secret  hash  function  chosen  from  an  xor-universal 
hash  family,  F  is  a  secure  PRF  with  secret  key  k,  and  r  is  a  short  random  nonce.  The  attractive 
feature  of  Carter- Wegman  MACs  is  that  the  long  message  m  is  hashed  by  a  fast  xor-universal  hash  h. 
We  show  that  a  slightly  modified  Carter- Wegman  MAC  is  quantum  secure  assuming  the  underlying 
PRF  is  quantum  secure  in  the  sense  of  Zhandry  [Zhal2b] . 

One-time  quantum  secure  MACs.  A  one-time  MAC  is  existentially  unforgeable  when  the 
adversary  can  make  only  a  single  chosen  message  query.  Classically,  one-time  MACs  are  constructed 
from  pair-wise  independent  hash  functions  [WC81].  These  MACs  are  one-time  secure  since  the  value 
of  a  pair-wise  independent  hash  at  one  point  gives  no  information  about  its  value  at  another  point. 
Therefore,  a  single  classical  chosen-message  query  tells  the  adversary  nothing  about  the  MAC  tag  of 
another  message. 

In  the  quantum  settings  things  are  more  complicated.  Unlike  the  classical  settings,  we  show 
that  pair-wise  independence  does  not  imply  existential  unforgeability  under  a  one-time  quantum 
chosen  message  attack.  For  example,  consider  the  simple  pair-wise  independent  hash  family 
H  =  { h(x)  =  mE  +  fr}a,fteFp  with  domain  and  range  Fp.  We  show  that  a  quantum  adversary  presented 
with  an  oracle  for  a  random  function  h  £  F.  can  find  both  a  and  b  with  a  single  quantum  query 
to  h.  Consequently,  the  classical  one-time  MAC  constructed  from  is  completely  insecure  in  the 
quantum  settings.  More  generally  we  prove  the  following  theorem: 

Theorem  1.2  (informal).  There  is  a  polynomial  time  quantum  algorithm  that  when  presented  with 
an  oracle  for  h(x)  =  ao  +  a±x  +  . . .  +  adXd  for  random  ao,...,ad  in  can  recover  ao,...,ad  using 
only  d  quantum  queries  to  the  oracle  with  probability  1  —  0(d/n). 

The  h(x)  =  ax  +  b  attack  discussed  above  is  a  special  case  of  this  theorem  with  d  =  1.  With 
classical  queries  finding  ao,  •  ■  • ,  ad  requires  d+  1  queries,  but  with  quantum  queries  the  theorem 
shows  that  d  queries  are  sufficient. 

Theorem  1.2  is  a  quantum  polynomial  interpolation  algorithm:  given  oracle  access  to  the 
polynomial,  the  algorithm  reconstructs  its  coefficients.  This  problem  was  studied  previously  by 
Kane  and  Kutin  [KK11]  who  prove  that  d/2  quantum  queries  are  insufficient  to  interpolate  the 
polynomial.  Interestingly,  they  conjecture  that  quantum  interpolation  requires  d  +  1  quantum 
queries  as  in  the  classical  case,  but  Theorem  1.2  refutes  that  conjecture.  Theorem  1.2  also  applies 
to  a  quantum  version  of  secret  sharing  where  the  shares  themselves  are  superpositions.  It  shows 
that  the  classical  Shamir  secret  sharing  scheme  [Sha79]  is  insecure  if  the  shares  are  allowed  to  be 
quantum  states  obtained  by  evaluating  the  secret  sharing  polynomial  on  quantum  superpositions. 
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More  generally,  the  security  of  secret  sharing  schemes  in  the  quantum  settings  was  analyzed  by 
Damard  et  al.  [DFNS11]. 

As  for  one-time  secure  MACs,  while  pair-wise  independence  is  insufficient  for  quantum  one-time 
security,  we  show  that  four- wise  independence  is  sufficient.  That  is,  a  four- way  independent  hash 
family  gives  rise  to  an  existentially  unforgeable  MAC  under  a  one-time  quantum  chosen  message 
attack.  It  is  still  an  open  problem  whether  three-way  independence  is  sufficient.  More  generally, 
we  show  that  (q  +  l)-way  independence  is  insufficient  for  a  (/-time  quantum  secure  MAC,  but 
(3 q  +  l)-way  independence  is  sufficient. 

Motivation.  Allowing  the  adversary  to  issue  quantum  chosen  message  queries  is  a  natural  and 
conservative  security  model  and  is  therefore  an  interesting  one  to  study.  Showing  that  classical 
MAC  constructions  remain  secure  in  this  model  gives  confidence  in  case  end-user  computing  devices 
eventually  become  quantum.  Nevertheless,  one  might  imagine  that  even  in  a  future  where  computers 
are  quantum,  the  last  step  in  a  MAC  signing  procedure  is  to  sample  the  resulting  quantum  state  so 
that  the  generated  MAC  is  always  classical.  The  quantum  chosen  message  query  model  ensures 
that  even  if  the  attacker  can  bypass  this  last  “classicalization”  step,  the  MAC  remains  secure. 

As  further  motivation  we  note  that  the  results  in  this  paper  are  the  tip  of  a  large  emerging 
area  of  research  with  many  open  questions.  Consider  for  example  signature  schemes.  Can  one 
design  schemes  that  remain  secure  when  the  adversary  can  issue  quantum  chosen  message  queries? 
Similarly,  can  one  design  encryption  systems  that  remain  secure  when  the  the  adversary  can  issue 
quantum  chosen  ciphertext  queries?  More  generally,  for  any  cryptographic  primitive  modeled  as  an 
interactive  game,  one  can  ask  how  to  design  primitives  that  remain  secure  when  the  interaction 
between  the  adversary  and  its  given  oracles  is  quantum. 

Other  related  work.  Several  recent  works  study  the  security  of  cryptographic  primitives  when 
the  adversary  can  issue  quantum  queries  [BDF+11,  Zhal2a,  Zhal2b].  So  far  these  have  focused  on 
proving  security  of  signatures,  encryption,  and  identity-based  encryption  in  the  quantum  random 
oracle  model  where  the  adversary  can  query  the  random  oracle  on  superpositions  of  inputs.  These 
works  show  that  many,  but  not  all,  random  oracle  constructions  remain  secure  in  the  quantum 
random  oracle  model.  The  quantum  random  oracle  model  has  also  been  used  to  prove  security  of 
Merkle’s  Puzzles  in  the  quantum  settings  [BS08,  BHK+11].  Meanwhile,  Damard  et  al.  [DFNS11] 
examine  secret  sharing  and  multiparty  computation  in  a  model  where  an  adversary  may  corrupt  a 
superposition  of  subsets  of  players,  and  build  zero  knowledge  protocols  that  are  secure,  even  when  a 
dishonest  verifier  can  issue  challenges  on  superpositions. 

Some  progress  toward  identifying  sufficient  conditions  under  which  classical  protocols  are  also 
quantum  immune  has  been  made  by  Unruh  [UnrlO]  and  Hallgren  et  al.  [HSS11].  Unruh  shows  that 
any  scheme  that  is  statistically  secure  in  Cannetti’s  universal  composition  (UC)  framework  [CanOl] 
against  classical  adversaries  is  also  statistically  secure  against  quantum  adversaries.  Hallgren  et  al. 
show  that  for  many  schemes  this  is  also  true  in  the  computational  setting.  These  results,  however, 
do  not  apply  to  MACs. 

2  Preliminaries:  Definitions  and  Notation 

Let  [n]  be  the  set  {1,  ...,n}.  For  a  prime  power  n,  let  Fn  be  the  finite  field  on  n  elements.  For  any 
positive  integer  n,  let  Zn  be  the  ring  of  integers  modulo  n. 
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Functions  will  be  denoted  by  capitol  letters  (such  as  F),  and  sets  by  capitol  script  letters 
(such  as  X).  We  denote  vectors  with  bold  lower-case  letters  (such  as  v),  and  the  components  of  a 
vector  v  6  An  by  vi,  i  £  [n].  We  denote  matrices  with  bold  capital  letters  (such  as  M),  and  the 
components  of  a  matrix  M  £  Amxn  by  My,  i  £  [to] ,  j  £  [n\.  Given  a  function  F  :  X  — >■  y  and  a 
vector  v  £  Xn,  let  F(v )  denote  the  vector  (F(vi),  F(v2),  ...,F(vk))-  Let  F([n])  denote  the  vector 
(F(l),F(2),..,F(n)). 

Given  a  vector  space  V,  let  dim  V  be  the  dimension  of  V,  or  the  number  of  vectors  in  any  basis  for 
V.  Given  a  set  of  vectors  {vi, v/J,  let  spanjvi, ...,  v*.}  denote  the  space  of  all  linear  combinations 
of  vectors  in  {vi, ...,  v^,}.  Given  a  subspace  S  of  an  inner-product  space  V,  and  a  vector  v  £  V, 
define  proj^v  as  the  orthogonal  projection  of  v  onto  S,  that  is,  the  vector  w  £  S  such  that  |v  —  w| 
is  minimized. 

Given  a  matrix  M,  we  define  the  rank,  denoted  rank(M),  to  be  the  size  of  the  largest  subset  of 
rows  (equivalently,  columns)  of  M  that  are  linearly  independent. 

Given  a  function  F  :  X  — »•  y  and  a  subset  S  C  X,  the  restriction  of  F  to  S  is  the  function 
F$  :  S  — >■  y  where  F$(x)  =  F(x )  for  all  x  £  S.  A  distribution  D  on  the  set  of  functions  F  from 
X  to  y  induces  a  distribution  D$  on  the  set  of  functions  from  S  to  y,  where  we  sample  from  D$ 
by  first  sampling  a  function  F  from  D,  and  outputting  F$.  We  say  that  D  is  A;- wise  independent 
if,  for  each  set  S  of  size  at  most  k,  each  of  the  distributions  D$  are  truly  random  distributions  on 
functions  from  S  to  y.  A  set  F  of  functions  from  X  to  y  is  fc-wise  independent  if  the  uniform 
distribution  on  F  is  A;- wise  independent. 

2.1  Quantum  Computation 

The  quantum  system  A  is  a  complex  Hilbert  space  "H  with  inner  product  (•]•).  The  state  of  a 
quantum  system  is  given  by  a  vector  |V’)  of  unit  norm  =  1).  Given  quantum  systems  "Hi 

and  7~L-2,  the  joint  quantum  system  is  given  by  the  tensor  product  B\  <g>  B2 ■  Given  \tpx)  £  T~L\  and 
\ip2)  £  %2,  the  product  state  is  given  by  |Vq}|V’2)  €E  T~L\  <8>  If 2-  Given  a  quantum  state  \ip)  and  an 
orthonormal  basis  B  =  { 1 60 ) ,  -  -  • ,  |&d-i)}  f°r  H,  a  measurement  of  |  tp)  in  the  basis  B  results  in  a 
value  bi  with  probability  |(6,:|V;)|2,  and  the  state  I'lp)  is  collapsed  to  the  state  | b%).  We  let  bi  \ip) 
denote  the  distribution  on  bi  obtained  by  sampling  \ip). 

A  unitary  transformation  over  a  d-dimensional  Hilbert  space  T-L  is  a  d  x  d  matrix  U  such  that 
UUt  =  Id,  where  represents  the  conjugate  transpose.  A  quantum  algorithm  operates  on  a 
product  space  'Hin®'H0ut®'kLwork  and  consists  of  n  unitary  transformations  Ui,  ...,Un  in  this  space. 
Win  represents  the  input  to  the  algorithm,  Bout,  the  output,  and  T-LWOrk  the  work  space.  A  classical 
input  x  to  the  quantum  algorithm  is  converted  to  the  quantum  state  \x,  0,0).  Then,  the  unitary 
transformations  are  applied  one-by-one,  resulting  in  the  final  state 

|^x)  =  Un...Ui|x,0,0)  . 

The  final  state  is  measured,  obtaining  ( a,b,c )  with  probability  \{a,b,c\tpx)\2 .  The  output  of  the 
algorithm  is  b. 

Quantum-accessible  Oracles.  We  will  implement  an  oracle  O  :  X  —)■  y  by  a  unitary  transfor¬ 
mation  O  where 

O]  x,y,z)  =  |  x,y  +  0(x),z) 
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where  +  :  X  x  X  — >  X  is  some  group  operation  on  X.  Suppose  we  have  a  quantum  algorithm  that 
makes  quantum  queries  to  oracles  O 1,  ...,Oq.  Let  |t/>o)  be  the  state  of  the  algorithm  before  any 
queries,  and  let  Ui,  ...,U9  be  the  unitary  transformations  applied  between  queries.  The  final  state 
of  the  algorithm  will  be 

Ug09...U101|V’o) 

We  can  also  have  an  algorithm  make  classical  queries  to  O*.  In  this  case,  the  input  to  the  oracle 
is  measured  before  applying  the  transformation  O,. 

Fix  an  oracle  O  :  X  — >  y.  Let  O ^  :  Xq  — >■  yq  be  the  oracle  that  maps  x  into  O(x)  = 
(O(xi),  0(x2),  ■■■,  0{xq)).  Observe  that  any  quantum  query  to  O ^  can  be  implemented  using  q 
quantum  queries  to  O,  where  the  unitary  transformations  between  queries  just  permute  the  registers. 
We  say  that  an  algorithm  that  makes  a  single  query  to  makes  q  non-adaptive  queries  to  O. 

The  Density  Matrix.  Suppose  the  state  of  a  quantum  system  depends  on  some  hidden  random 
variable  z  €  Z,  which  is  distributed  according  to  a  distribution  D.  That  is,  if  the  hidden  variable  is 
z,  the  state  of  the  system  is  We  can  then  define  the  density  matrix  of  the  quantum  system  as 

p=J2  Pi'MIVlXVlI 

zez 

Applying  a  unitary  matrix  U  to  the  quantum  state  corresponds  to  the  transformation 

p  -f  UpUf 

A  partial  measurement  on  some  registers  has  the  effect  of  zeroing  out  the  terms  in  p  where  those 
registers  are  not  equal.  For  example,  if  we  have  two  registers  x  and  y,  and  we  measure  the  x  register, 
then  the  new  density  matrix  is 


'x  ,y,x,y' 


Px,y,x',yf  ^  q  q 
0  otherwise 


2.2  Quantum  secure  MACs 

A  MAC  system  comprises  two  algorithms:  a  (possibly)  randomized  MAC  signing  algorithm  S(k,m) 
and  a  MAC  verification  algorithm  V(k,m,t).  Here  k  denotes  the  secret  key  chosen  at  random  from 
the  key  space,  m  denotes  a  message  in  the  message  space,  and  t  denotes  the  MAC  tag  in  the  tag 
space  on  the  message  m.  These  algorithms  and  spaces  are  parameterized  by  a  security  parameter  A. 

Classically,  a  MAC  system  is  said  to  be  secure  if  no  attacker  can  win  the  following  game:  a 
random  key  k  is  chosen  from  the  key  space  and  the  attacker  is  presented  with  a  signing  oracle 
S(k,  •).  Queries  to  the  signing  oracle  are  called  chosen  message  queries.  Let  {(m,;,  be  the  set 

of  message-tag  pairs  that  the  attacker  obtains  by  interacting  with  the  signing  oracle.  The  attacker 
wins  the  game  if  it  can  produce  an  existential  forgery,  namely  a  valid  message-tag  pair  ( m*,t *) 
satisfying  (■ m*,t *)  0  {(mj,tj)}|=1.  The  MAC  system  is  said  to  be  secure  if  no  “efficient”  adversary 
can  win  this  game  with  non-negligible  probability  in  A. 
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Quantum  chosen  message  queries.  In  the  quantum  settings  we  allow  the  adversary  to  maintain 
its  own  quantum  state  and  issue  quantum  queries  to  the  signing  oracle.  Let  '}frn)X,y'll’m,x,y  \m,x,y) 
be  the  adversary’s  state  just  prior  to  issuing  a  signing  query.  The  MAC  signing  oracle  transforms 
this  state  as  follows: 

1.  it  chooses  a  random  string  r  that  will  be  used  by  the  MAC  signing  algorithm, 

2.  it  signs  each  “slot”  in  the  given  superposition  by  running  S(k,  m;  r),  that  is  running  algorithm 
S  with  randomness  r.  More  precisely,  the  signing  oracle  performs  the  following  transformation: 

X!  ^ m,x,y  I m,x,y)  — >  'lPm,x,y  \m,  x  ©  S(k,m\r),  y) 

m,x,y  m,  x,y 

When  the  signing  algorithm  is  deterministic  there  is  no  need  to  choose  an  r.  However,  for  randomized 
signing  algorithms  the  same  randomness  is  used  to  compute  the  tag  for  all  slots  in  the  superposition. 
Alternatively,  we  could  have  required  fresh  randomness  in  every  slot,  but  this  would  make  it  harder 
to  implement  the  MAC  system  on  a  quantum  device.  Allowing  the  same  randomness  in  every  slot 
is  more  conservative  and  frees  the  signer  from  this  concern.  At  any  rate,  the  two  models  are  very 
close  —  if  need  be,  the  random  string  r  can  be  used  as  a  key  for  a  quantum-secure  PRF  [Zhal2b] 
which  is  used  to  generate  a  fresh  pseudorandom  value  for  every  slot. 

Existential  forgery.  After  issuing  q  quantum  chosen  message  queries  the  adversary  wins  the 
game  if  it  can  generate  q  +  1  valid  classical  message-tag  pairs. 

Definition  2.1.  A  MAC  system  is  existentially  unforgeable  under  a  quantum  chosen  message  attack 
(EJJF-qCMA)  if  no  adversary  can  with  the  quantum  MAC  game  with  non-negligible  advantage  in  A. 

Zhandry  [Zhal2b]  gives  an  example  of  a  classically  secure  PRF  that  is  insecure  under  quantum 
queries.  This  PRF  gives  an  example  MAC  that  is  classically  secure,  but  insecure  under  quantum 
queries.  Our  goal  for  the  remainder  of  the  paper  is  to  construct  EUF-qCMA  secure  MACs. 

3  The  Rank  Method 

In  this  section  we  introduce  the  rank  method  which  is  a  general  approach  to  proving  lower  bounds 
on  quantum  algorithms.  The  setup  is  as  follows:  we  give  a  quantum  algorithm  A  access  to  some 
quantity  H  G  77.  By  access,  we  mean  that  the  final  state  of  the  algorithm  is  some  fixed  function 
of  H.  In  this  paper,  T-L  will  be  a  set  of  functions,  and  A  will  be  given  oracle  access  to  H  6  Ji  by 
allowing  A  to  make  q  quantum  oracle  queries  to  H,  for  some  q.  For  now,  we  will  treat  T-L  abstractly, 
and  return  to  the  specific  case  where  T~l  is  a  set  of  functions  later. 

The  idea  behind  the  rank  method  is  that,  if  we  treat  the  final  states  of  the  algorithm  on 
different  H  as  vectors,  the  space  spanned  by  these  vectors  will  be  some  subspace  of  the  overall 
Hilbert  space.  If  the  dimension  of  this  subspace  is  small  enough,  the  subspace  (and  hence  all  of 
the  vectors  in  it)  must  be  reasonably  far  from  most  of  the  vectors  in  the  measurement  basis.  This 
allows  us  to  bound  the  ability  of  such  an  algorithm  to  achieve  some  goal. 

For  H  £  H,  let  | ipH)  be  the  final  state  of  the  quantum  algorithm  A,  before  measurement,  when 
given  access  to  H.  Suppose  the  different  \f>H)  vectors  all  he  in  a  space  of  dimension  d.  Let  ^ a,h  be 
the  the  \H\  x  d  matrix  whose  rows  are  the  various  vectors  \ifjj)- 
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Definition  3.1.  For  a  quantum  algorithm.  A  given  access  to  some  value  H  €  Ti,  we  define  the  rank, 
denoted  rank(^4,  FL),  as  the  rank  of  the  matrix  \tr a,h ■ 

The  rank  of  an  algorithm  A  seemingly  contains  very  little  information:  it  gives  the  dimension  of 
the  subspace  spanned  by  the  \fiH)  vectors,  but  gives  no  indication  of  the  orientation  of  this  subspace 
or  the  positions  of  the  \fiu)  vectors  in  the  subspace.  Nonetheless,  we  demonstrate  how  the  success 
probability  of  an  algorithm  can  be  bounded  from  above  knowing  only  the  rank  of  \I/ a; h- 

Theorem  3.2.  Let  A  be  a  quantum  algorithm  that  has  access  to  some  value  H  £  FL  drawn  from 
some  distribution  D  and  produces  some  output  w  £  W.  Let  R  :  Fi  x  W  — >  {True,  False}  be  a  binary 
relation.  Then  the  probability  that  A  outputs  some  w  such  that  R(H ,  w)  =  True  is  at  most 

(max  Pr  \R(H,  )  x  rank!  A  H)  . 

\weWH^Dl  v  nJ  K  ’ 

In  other  words,  the  probability  that  A  succeeds  in  producing  w  £  W  for  which  R(H ,  w)  is  true  is 
at  most  rank(A,"H)  times  the  best  probability  of  success  of  any  algorithm  that  ignores  H  and  just 
outputs  some  fixed  w. 


Proof.  The  probability  that  A  outputs  a  w  such  that  R(H,  w)  =  True  is 


FT  [R(H,w)}  =  Y/?Dm  £  \(ujW’h)\2  =  £  £  Pj[H]\(w\^h)\2 

T.  \  H  w:R(H,w )  w  H:R(H,w) 


Now,  \(w\ipH)\  is  just  the  magnitude  of  the  projection  of  \w)  onto  the  space  spanned  by  the 
vector  that  is,  projspan|^  ^(|rc)).  This  is  at  most  the  magnitude  of  the  projection  of  |tc)  onto 

the  space  spanned  by  all  of  the  \ipHr)  for  H'  £  "H,  or  Projspari{|,/,  ^}(|'tu)).  Thus, 


^  £  E  FjlH]  Pr°jspan{|V)}(k)) 

w<- \*H)  W  ) 

Now,  we  can  perform  the  sum  over  H,  which  gives  Pr h^d[R(H,w)]-  We  can  bound  this  by  the 
maximum  it  attains  over  all  w,  giving  us 

Fid  [. R(H,w )]  <  (ma x  Pr  [R(H,w)]\  £  proj8pan{|^ u(M) 


H 
W<— 


Now,  let  | bi)  be  an  orthonormal  basis  for  span{|/*/^/)}.  Then 


Pr°jspan{|V)}(k)) 


=  £i<wr 


Summing  over  all  iv  gives 


£ 


Pr°jspan{|V)}(k)) 


££  \(bi\w)\2  =  EEK^HI2 

w  i  i  w 
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Since  the  w  are  the  possible  results  of  measurement,  the  vectors  | w)  form  an  orthonormal  basis 
for  the  whole  space,  meaning  J2w  \(bi\w)\2  =  I  l&i)  P  =  1-  Hence,  the  sum  just  becomes  the  number 
of  | bi),  which  is  just  the  dimension  of  the  space  spanned  by  the  | ipHi).  Thus, 

Pr  \R(H,w)]  <  frnax  Pr  \R(H,w)} 

H<—D  n  H^DV  J 

But  dimspandd#/}}  is  exactly  rank(^rJ4^)  =  rank(H,%),  which  finishes  the  proof  of  the 
theorem.  □ 

We  now  move  to  the  specific  case  of  oracle  access.  R  is  now  some  set  of  functions  from  X  to 
T,  and  our  algorithm  A  makes  q  quantum  oracle  queries  to  a  function  H  E  R.  Concretely,  A  is 
specified  by  q  +  1  unitary  matrices  Uj,  and  the  final  state  of  A  on  input  H  is  the  state 

UgHUg_!  •  •  •  TJdHUolO) 

where  H  is  the  unitary  transformation  mapping  \x,  y,  z)  into  \x,  y  +  H(x),z),  representing  an  oracle 
query  to  the  function  H.  To  use  the  rank  method  (Theorem  3.2)  for  our  purposes,  we  need  to 
bound  the  rank  of  such  an  algorithm.  First,  we  define  the  following  quantity: 

-ir  • 

Theorem  3.3.  Let  X  and  y  be  sets  of  size  m  and  n  and  let  Hq  be  some  function  from  X  to  y.  Let 
S  be  a  subset  of  X  of  size  k  and  let  R  be  some  set  of  functions  from  X  to  y  that  are  equal  to  Hq 
except  possibly  on  points  in  S.  If  A  is  a  quantum  algorithm  making  q  queries  to  an  oracle  drawn 
from  R,  then 

v&nk(A,R)  <  Ck^n  ■ 

Proof.  Let  \ifqH)  be  the  final  state  of  a  quantum  algorithm  after  q  quantum  oracle  calls  to  an  oracle 
HeR.  We  wish  to  bound  the  dimension  of  the  space  spanned  by  the  vectors  |djj)  for  all  H  £  R. 
We  accomplish  this  by  exhibiting  a  basis  for  this  space.  Our  basis  consists  of  |df )-,)  vectors  where 
H'  only  differs  from  Hq  at  a  maximum  of  q  points  in  S.  We  need  to  show  that  two  things:  that  our 
basis  consists  of  Ck,q,n  vectors,  and  that  our  basis  does  in  fact  span  the  whole  space. 

We  first  count  the  number  of  basis  vectors  by  counting  the  number  of  H'  oracles.  For  each  r, 
there  are  (jj)  ways  of  picking  the  subset  T  of  size  r  from  S  where  H'  will  differ  from  Hq.  For  each 
subset  T,  there  are  nr  possible  functions  H' .  However,  if  any  value  x  E  T  satisfies  F(x)  =  Hq(x), 
then  this  is  equivalent  to  a  case  where  we  remove  x  from  T,  and  we  would  have  already  counted 
this  case  for  a  smaller  value  of  r.  Thus,  we  can  assume  H'(x)  ^  Hq(x)  for  all  x  in  T.  There  are 
(n  —  l)r  such  functions.  Summing  over  all  r,  we  get  that  the  number  of  distinct  H'  oracles  is 

£d)(»-ir=ct,,,n  • 

Next,  we  need  to  show  that  the  Idfr)  vectors  span  the  entire  space  of  | ifqH)  vectors.  We  first 
introduce  some  notation:  let  |d°)  be  the  state  of  a  quantum  algorithm  before  any  quantum  queries. 
Let  | ifjj)  be  the  state  after  q  quantum  oracle  calls  to  the  oracle  H.  Let 

=  UgHUg_iH  UiH  . 
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Then  | ij)qH)  =  M9^0). 

We  note  that  since  |t/>°)  is  fixed  for  any  algorithm,  it  is  sufficient  to  prove  that  the  matrices 
are  spanned  by  the 

For  any  subset  T  of  S,  and  a  function  F  :  T  -A  y,  let  Jt,f  be  the  oracle  such  that 


F(x)  if  x  €  T 
H0(  x)  otherwise 


Let  M j-  H  denote  M qJr  h  .  In  other  words,  M7-/7  is  the  transformation  matrix  corresponding  to 
the  oracle  that  is  equal  to  H  on  the  set  T,  and  equal  to  Hq  elsewhere.  We  claim  that  any  for 
H  G  T~Ls  is  a  linear  combination  of  the  matrices  M  j-  H  for  subsets  T  of  S  of  size  at  most  q.  We  will 
fix  a  particular  H ,  and  for  convenience  of  notation,  we  will  let  J-y  =  Jt,h-  That  is,  J-y  is  the  oracle 
that  is  equal  to  H  on  the  set  T  and  Hq  otherwise.  We  will  also  let  M j-  =  M j-  H  and  M9  = 

That  is,  M9  is  the  transition  matrix  corresponding  to  the  oracle  H,  and  M7-  is  the  transition  matrix 
corresponding  to  using  the  oracle  J-y.  For  the  singleton  set  {x},  we  will  also  let  Jx  =  J{xy 
We  make  the  following  observations: 

H=  (EJ*)  -(*-l)Ho 

J r=  (e  -(|T|-i)h0 

VxeT  / 

These  identities  can  be  seen  by  applying  each  side  to  the  different  inputs.  Next,  we  take 
and  and  expand  out  the  H  and  J7-  terms  using  Equations  3.1  and  3.2: 

M9  =  u9  (  (E  -  (*  -  1)H0 j  U9_!  •  •  •  Ur  ^  -  (A:  -  1)H0^  (3.3) 

M-t  =  ^E  -  (171  -  !)Ho)  U,_r  •  •  •  Ur  ^ X)  J*j  "  (ITI  -  1)H0^  (3.4) 

Let  Jj_  =  Hq.  For  a  vector  r  G  (S  U  {_L})9,  let 


(3.1) 

(3.2) 


Pr  —  \J qj rq\J q—l  •  •  ■  J  T2FJ  \J  ri 

For  a  particular  r,  we  wish  to  expand  the  M9  and  M^-  matrices  in  terms  of  the  Pr  matrices.  If 
d  of  the  components  of  r  are  _L,  then  the  coefficient  of  Pr  in  the  expansion  of  M9  is  (— 1  )d(k  —  l)d. 
If,  in  addition,  all  of  the  other  components  of  r  lie  in  T,  then  the  coefficient  in  the  expansion  of 
Mj-  is  (— l)rf(|T|  —  l)d  (if  any  of  the  components  of  r  lie  outside  of  T,  the  coefficient  is  0). 

Now,  we  claim  that,  for  some  values  ae,  we  have 

M?  =  I>  E  Mr 

1=0  TCS:\T\=£ 


To  accomplish  this,  we  look  for  the  coefficient  of  Pr  in  the  expansion  of  the  right  hand  side 
of  this  equation.  Fix  an  £.  Let  d  be  the  number  of  components  of  r  equal  to  _L,  and  let  p  be  the 
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number  of  distinct  component  values  other  than  _L.  Notice  that  p  +  d  <  q.  Then  there  are  (k(Zp) 
different  sets  T  of  size  £  for  which  all  of  the  values  of  the  components  lie  in  T.  Thus,  the  coefficient 
of  Pr  is 

Therefore,  we  need  values  ai  such  that 

=  (3.5) 

for  all  d,p.  Notice  that  we  can  instead  phrase  this  problem  as  a  polynomial  interpolation  problem. 
The  right  hand  side  of  Equation  3.5  is  a  polynomial  P  of  degree  d  <  q  —  p,  evaluated  at  fc  —  1.  We 
can  interpolate  this  polynomial  using  the  points  i  =  p, ...,  q,  obtaining 


p(*:-i)=£p(*-i)  n 


i=p 


i=p,j¥=t 


k  —  p 
l  —  p 


The  numerator  of  the  product  evaluates  to 


(fc-p)! 

(k-e)(k-q-  1)! 

while  to  evaluate  the  bottom,  we  split  it  into  two  parts:  j  =  p,  1  and  j  —  l  +  1, ...,  q.  The  first 

part  evaluates  to  {£  —  p)\,  and  the  second  part  evaluates  to  (— 1  )q~\q  —  £)\.  With  a  little  algebraic 
manipulation,  we  have  that 


P(*-1)  =  £P(*-1)| 

£=p 


k-e-  i 

k-q-l, 


(-1) 


q-t 


K  —  p' 

J-PJ 


for  all  polynomials  P(x)  of  degree  at  most  q  —  p.  Setting  P(x)  =  xd  for  d  =  0, ...,  q  —  £,  we  see 
that  Equation  3.5  is  satisfied  if 


□ 


3.1  An  Example 

Suppose  our  task  is  to,  given  one  quantum  query  to  an  oracle  H  :  X  y,  produce  two  distinct 
pairs  (xo,yo)  and  (x\,yi)  such  that  H(x o)  =  yo  and  H(x i)  =  y\.  Suppose  further  that  H  is  drawn 
from  a  pairwise  independent  set  T-L.  We  will  now  see  that  the  rank  method  leads  to  a  bound  on  the 
success  probability  of  any  quantum  algorithm  A. 

Corollary  3.4.  No  quantum  algorithm  A,  making  a  single  query  to  a  function  H  :  X  — >•  y  drawn 
from  a  pairwise  independent  set  T~L,  can  produce  two  distinct  input/output  pairs  of  H ,  except  with 
probability  at  most  |A|/|T|- 
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Proof.  Let  m  =  \X\  and  n  =  |^|.  Since  no  outputs  of  H  are  fixed,  we  will  set  S  =  X  in  Theorem  3.3, 
showing  that  the  rank  of  the  algorithm  A  is  bounded  by  C7 n,i,n  =  1  +  m(n  —  1)  <  mn.  If  an 
algorithm  makes  no  queries  to  H,  the  best  it  can  do  at  outputting  two  distinct  input/output  pairs 
is  to  just  pick  two  arbitrary  distinct  pairs,  and  output  those.  The  probability  that  this  zero-query 
algorithm  succeeds  is  at  most  1/n2.  Then  Theorem  3.2  tells  us  that  A  succeeds  with  probability  at 
most  rank(A,  T-L)  times  this  amount,  which  equates  to  □ 

For  m  >  n,  this  bound  is  trivial.  However,  for  m  smaller  than  n,  this  gives  a  non-trivial  bound, 
and  for  m  exponentially  smaller  than  n,  this  bound  is  negligible. 

4  Outputting  Values  of  a  Random  Oracle 

In  this  section,  we  will  prove  Theorem  1.1.  We  consider  the  following  problem:  given  q  quantum 
queries  to  a  random  oracle  H  :  X  — >  y,  produce  k  >  q  distinct  pairs  (xi,  yi)  such  that  yi  =  H(xi). 
Let  n  =  |T|  be  the  size  of  the  range.  Motivated  by  our  application  to  quantum-accessible  MACs, 
we  are  interested  in  the  case  where  the  range  y  of  the  oracle  is  large,  and  we  want  to  show  that 
to  produce  even  one  extra  input/output  pair  (k  =  q  +  1)  is  impossible,  except  with  negligible 
probability.  We  are  also  interested  in  the  case  where  the  range  of  the  oracle,  though  large,  is  far 
smaller  than  the  domain.  Thus,  the  bound  we  obtained  in  the  previous  section  (Corollary  3.4)  is 
not  sufficient  for  our  purposes,  since  it  is  only  non-trivial  if  the  range  is  larger  than  the  domain. 

In  the  classical  setting,  when  k  <  q,  this  problem  is  easy,  since  we  can  just  pick  an  arbitrary 
set  of  k  different  Xi  values,  and  query  the  oracle  on  each  value.  For  k  >  q,  no  adversary  of  even 
unbounded  complexity  can  solve  this  problem,  except  with  probability  1  /nk~q,  since  for  any  set 
of  k  inputs,  at  least  k  —  q  of  the  corresponding  outputs  are  completely  unknown  to  the  adversary. 
Therefore,  for  large  n,  we  have  have  a  sharp  threshold:  for  k  <  q,  this  problem  can  be  solved 
efficiently  with  probability  1,  and  for  even  k  =  q  + 1,  this  problem  cannot  be  solved,  even  inefficiently, 
except  with  negligible  probability. 

In  the  quantum  setting,  the  k  <  q  case  is  the  same  as  before,  since  we  can  still  query  the 
oracle  classically.  However,  for  k  >  q,  the  quantum  setting  is  more  challenging.  The  adversary  can 
potentially  query  the  random  oracle  on  a  superposition  of  all  inputs,  so  he  “sees”  the  output  of  the 
oracle  on  all  points.  Proving  that  it  is  still  impossible  to  produce  k  input/output  pairs  is  thus  more 
complicated,  and  existing  methods  fail  to  prove  that  this  problem  is  difficult.  Therefore,  it  is  not 
immediately  clear  that  we  have  the  same  sharp  threshold  as  before. 

In  Section  4.1  we  use  the  rank  method  to  bound  the  probability  that  any  (even  computationally 
unbounded)  quantum  adversary  succeeds.  Then  in  Section  4.2  we  show  that  our  bound  is  tight  by 
giving  an  efficient  algorithm  for  this  problem  that  achieves  the  lower  bound.  In  particular,  for  an 
oracle  H  :  X  — >•  y  we  consider  two  cases: 

•  Exponentially-large  range  y  and  polynomial  k,  q.  In  this  case,  we  will  see  that  the  success 
probability  even  when  k  =  q  +  1  is  negligible.  That  is,  to  produce  even  one  additional 
input/output  pair  is  hard.  Thus,  we  get  the  same  sharp  threshold  as  in  the  classical  case 

•  Constant  size  range  y  and  polynomial  k,  q.  We  show  that  even  when  q  is  a  constant  fraction 
of  k  we  can  still  produce  k  input/output  pairs  with  overwhelming  probability  using  only  q 
quantum  queries.  This  is  in  contrast  to  the  classical  case,  where  the  success  probability  for 
q  =  ck,  c  <  1,  is  negligible  in  k. 
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4.1  A  Tight  Upper  Bound 

Theorem  4.1.  Let  A  be  a  quantum  algorithm  making  q  queries  to  a  random  oracle  H  :  X  — >■  y 
whose  range  has  size  n,  and  produces  k  >  q  pairs  (xq  yf)  E  X  x  T-  T/ie  probability  that  the  Xi  values 
are  distinct  and  yi  =  H(xi )  for  all  i  €  [fc]  is  at  most  ^Ck,q,n- 

Proof.  Before  giving  the  complete  proof,  we  sketch  the  special  case  where  k  is  equal  to  the  size  of 
the  domain.  In  this  case,  any  quantum  algorithm  that  outputs  k  distinct  input /output  pairs  must 
output  all  input/output  pairs.  Similar  to  the  proof  of  Corollary  3.4,  we  will  set  S  =  A,  and  use 
Theorem  3.3  to  bound  the  rank  of  A  at  Ck,q,n-  Now,  any  algorithm  making  zero  queries  succeeds 
with  probability  at  most  l/nk.  Theorem  3.2  then  bounds  the  success  probability  of  any  q  query 
algorithm  as 

'uC'k.q.n  ■ 

nK 

Now  for  the  general  proof:  first,  we  will  assume  that  the  probability  A  outputs  any  particular 
sequence  of  xt  values  is  independent  of  the  oracle  H.  We  will  show  how  to  remove  this  assumption 
later.  We  can  thus  write 

H’h)  =  Z)a*ix)i^.*) 

X 

where  ax  are  complex  numbers  whose  square  magnitudes  sum  to  one,  and  |x)|</>#.x)  is  the  normalized 
projection  of  \ifqH)  onto  the  space  spanned  by  |x,  w)  for  all  w.  The  probability  that  A  succeeds  is 
equal  to 

^pr[i4]^|(x,if(x)|^)|2  =  EPr[^]ElaEl(#(x)|<^/,x)|2  . 

H  x  H  x 

First,  we  reorder  the  sums  so  the  outer  sum  is  the  sum  over  x.  Now,  we  write  H  =  (Ho,  H\) 
where  Hq  is  the  oracle  restricted  to  the  components  of  x,  and  H\  is  the  oracle  restricted  to  all  other 
inputs.  Thus,  our  probability  is: 

1  2  I  2 

I  <flo(x)|0(ffOlff1)>x>  • 

X  Ho,  Hi 

Using  the  same  trick  as  we  did  before,  we  can  replace  \{H(x.)\(j)H,x}\  with  the  quantity 

Pr°jspan|</)(fr0iH-l)!X>l^o(T))  , 


which  is  bounded  by 


Projspan{|^Hi)iU}|tfo(x)} 


as  we  vary  Hq  over  oracles  whose  domain  is  the 


components  of  x-  The  probability  of  success  is  then  bounded  by 


—  E 

nm  ^ 


Oty, 


!  E 

H0, Ho 


Projspanll^^JlWx)) 


We  now  perform  the  sum  over  Hq.  Like  in  the  proof  of  Corollary  3.4,  the  sum  evaluates  to 
dimspan{|0(#/ tHi),x)}-  Since  the  |^(_f/'._f/1).x)  vectors  are  projections  of  this  dimension  is 

bounded  by  dimspanj  ifqH,  /q))}-  Let  %  be  the  set  of  oracles  (H'0,  H\)  as  we  vary  Hq,  and  consider 
A  acting  on  oracles  in  H.  Fix  some  oracle  Hq  from  among  the  Hq  oracles,  and  let  S  be  the  set  of 
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components  of  x.  Then  (Hq,  H\)  differs  from  (Hq,Hi)  only  on  the  elements  of  S.  Since  |<S|  <  k, 
Theorem  3.2  tells  us  that  rank (A,TL)  <  Ck,q,n-  But 


rank(A’H) 


dimspan{|^iHl)}} 


Therefore,  we  can  bound  the  success  probability  by 


E  laxl 2'52Ck,q,n  ■ 

U  x  Hl 


Summing  over  all  nm  k 


different  H i  values  and  all  x  values  gives  a  bound  of 


1 

nk 


C\ 


q,n 


as  desired. 

So  far,  we  have  assume  that  A  produces  x  with  probability  independent  of  H.  Now,  suppose 
our  algorithm  A  does  not  produce  x  with  probability  independent  of  the  oracle.  We  construct  a 
new  algorithm  B  with  access  to  H  that  does  the  following:  pick  a  random  oracle  O  with  the  same 
domain  and  range  as  H,  and  give  A  the  oracle  H  +  O  that  maps  x  into  H(x)  +  O(x).  When  A 
produces  k  input/output  pairs  ( Xi,yi ),  output  the  pairs  (xi,yt  —  O(xi)).  ( Xi,yi )  are  input/output 
pairs  of  H  +  O  if  and  only  if  (a;j,  ]ji  —  0(xi ))  are  input/output  pairs  of  H.  Further,  A  still  sees  a 
random  oracle,  so  it  succeeds  with  the  same  probability  as  before.  Moreover,  the  oracle  A  sees  is 
now  independent  of  H,  so  B  outputs  x  with  probability  independent  of  H.  Thus,  applying  the 
above  analysis  to  B  shows  that  B,  and  hence  A,  produce  k  input/output  pairs  with  probability  at 
most  . 

rCfc.o.n 

nr 


□ 


For  this  paper,  we  are  interested  in  the  case  where  n  =  |T|  is  exponentially  large,  and  we  are 
only  allowed  a  polynomial  number  of  queries.  Suppose  k  =  q  +  1,  the  easiest  non-trivial  case  for  the 
adversary.  Then,  the  probability  of  success  is 


1 

ni+1 


E 


r= 0 


ir  =  i  -  (i 


iy+1  <  g+  1 

n)  ~  n 


(4.1) 


Therefore,  to  produce  even  one  extra  input/output  pair  is  impossible,  except  with  exponentially 
small  probability,  just  like  in  the  classical  case.  This  proves  the  first  part  of  Theorem  1.1. 


4.2  The  Optimal  Attack 

In  this  section,  we  present  a  quantum  algorithm  for  the  problem  of  computing  H(xi )  for  k  different 
Xi  values,  given  only  q  <  k  queries: 

Theorem  4.2.  Let  X  and  y  be  sets,  and  fix  integers  q  <  k,  and  k  distinct  values  x\,  ...,Xk  £  X. 
There  exists  a  quantum  algorithm  A  that  makes  q  non-adaptive  quantum  queries  to  any  function 
H  :  X  — »•  y ,  and  produces  H(x i),  ...,H{xk)  with  probability  Ck,q,n/nk>  where  n  =  |T|- 
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The  algorithm  is  similar  to  the  algorithm  of  [vD98],  though  generalized  to  handle  arbitrary  range 
sizes.  This  algorithm  has  the  same  success  probability  as  in  Theorem  4.1,  showing  that  both  our 
attack  and  lower  bound  of  Theorem  4.1  are  optimal.  This  proves  the  second  part  of  Theorem  1.1. 

Proof.  Assume  that  y  =  {0, ...,  n  —  1}.  For  a  vector  y  E  yk,  let  A(y)  be  the  number  of  coordinates 
of  y  that  do  not  equal  0.  Also,  assume  that  X{  =  i. 

Initially,  prepare  the  state  that  is  a  uniform  superposition  of  all  vectors  y  E  yk  such  that 
A(y)  <  q: 

l^i)  =  \  E  ly> 

V  V  y:A(y)<g 

Notice  that  the  number  of  vectors  of  length  k  with  at  most  q  non-zero  coordinates  is  exactly 

£Q<»-ir=ct,,,n  • 

We  can  prepare  the  state  efficiently  as  follows:  Let  Setupfci(Jin  :  [Cfcj9)n]  — >  [n]k  be  the  following 
function:  on  input  £  E  [Ck,q,n\, 

•  Check  if  i  <  Ck- i,q,n-  If  so,  compute  the  vector  y'  =  Setup*..!  (n),  and  output  the  vector 
y  (".y'1. 

•  Otherwise,  let  £'  =  £  —  Ck- i,q,n-  It  is  easy  to  verify  that  £’  E  [(n  —  1  )Ck-i,q-i,n\- 

•  Let  £"  E  Ck-i:q-i,n  and  yo  E  [n]\{0}  be  the  unique  such  integers  such  that  £'  =  (n—l)£"+yo—n. 

•  Let  y'  =  Setup*..^.!^"),  and  output  the  vector  y  =  (yo,y'). 

The  algorithm  relies  on  the  observation  that  a  vector  y  of  length  k  with  at  most  q  non-zero 
coordinates  falls  into  one  of  either  two  categories: 

•  The  first  coordinate  is  0,  and  the  remaining  k  —  1  coordinates  form  a  vector  with  at  most  q 
non-zero  coordinates 

•  The  first  coordinate  is  non-zero,  and  the  remaining  k  —  1  coordinates  form  a  vector  with  at 
most  q  —  1  non-zero  coordinates. 

There  are  Ck- i,q,n  vectors  of  the  first  type,  and  Ck- i,q-i,n  vectors  of  the  second  type  for  each 
possible  setting  of  the  first  coordinate  to  something  other  than  0.  Therefore,  we  divide  [Aktq>n]  into 
two  parts:  the  first  Ck-i,q,n  integers  map  to  the  first  type,  and  the  remaining  (n  —  l)Ck-\,q-\,n 
integers  map  to  vectors  of  the  second  type. 

We  note  that  Setup  is  efficiently  computable,  invertible,  and  its  inverse  is  also  efficiently 
computable.  Therefore,  we  can  prepare  by  first  preparing  the  state 

=  E  » 

■1'n  t£[Ck<q,n] 

and  reversibly  converting  this  state  into  |^>i)  using  Setupfcj(?  ri. 

Next,  let  F  :  yk  — »•  [k]q  be  the  function  that  outputs  the  indexes  i  such  that  y *  ^  0,  in  order  of 
increasing  i.  If  there  are  fewer  than  q  such  indexes,  the  function  fills  in  the  remaining  spaces  the 
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first  indexes  such  that  yt  =  0  If  there  are  more  than  q  indexes,  the  function  truncates  to  the  first  q. 
F  is  realizable  by  a  simple  classical  algorithm,  so  it  can  be  implemented  as  a  quantum  algorithm. 
Apply  this  algorithm  to  |t/q),  obtaining  the  state 

W2)  = -^=  E  ly’F(y)> 

V^k,q,n  y:A(y)<5 

Next,  let  G  :  yk  — >  yq  be  the  function  that  takes  in  vector  y,  computes  x  =  F( y),  and  outputs 
the  vector  (yXl,yx2i  ■■■■>yxq)-  In  other  words,  it  outputs  the  vector  of  the  non-zero  components  of  y, 
padding  with  zeros  if  needed.  This  function  is  also  efficiently  computable  by  a  classical  algorithm, 
so  we  can  apply  if  to  each  part  of  the  superposition: 

1^3 )  = —tf^=  E  ly>i?(y)>G'(y)) 

V  Ufc’9’n  y:A(y)<g 

Now  we  apply  the  Fourier  transform  to  the  G( y)  part,  obtaining 

1^4}  =  ^^=  E  ly>F(y))Ee~^(z’G(y)>lz) 

V^k,q,n  y:A(y)<9 

Now  we  can  apply  H  to  the  F{ y)  part  using  q  non-adaptive  queries,  adding  the  answer  to  the  z 
part.  The  result  is  the  state 

1^5)  =  ^^=  E  \y’F(y))J2e~^{z'G(y))\z  +  H(FW)) 

V^k,q,n  y:A(y)<? 

We  can  rewrite  this  last  state  as  follows: 

l^s)  =  ~^=  E  e^^™)’G^|y,F(y))Ee-^<z’G(y)>|z) 

V^k,q,n  y;A(y)<g 


Now,  notice  that  H(F( y))  is  the  vector  of  H  applied  to  the  indexes  where  y  is  non-zero,  and 
that  G( y)  is  the  vector  of  values  of  y  that  those  points.  Thus  the  inner  product  is 


k 

(H(F(y),G(y))  =  £  H(i)  x  Vi  =  E  H(i)Vi  =  (H([k\),  y)  . 

i'-Vi^O  i=0 

The  next  step  is  to  uncompute  the  z  and  F( y)  registers,  obtaining 

\A)  =  ^=  E  e^™^|y) 

V^k,q,n  y:A(y)<g 


Lastly,  we  perform  a  Fourier  transform  the  remaining  space,  obtaining 


1^7)  = 


:E  E 


\ICk,q,nnk  z  \y:A(y)<g 

Now  measure.  The  probability  we  obtain  H([k])  is 


E  1 

y:A(y  )<q 


z,y> 


Ck,q,nkl 


Ck„ 


q,n 


7ln 


as  desired. 


□ 
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As  we  have  already  seen,  for  exponentially-large  A’,  this  attack  has  negligible  advantage  for  any 
k  >  q.  However,  if  n  =  |T|  is  constant,  we  can  do  better.  The  error  probability  is 


This  is  the  probability  that  k  consecutive  coin  flips,  where  each  coin  is  heads  with  probability 
1/n,  yields  fewer  than  k  —  q  heads.  Using  the  Chernoff  bound,  if  q  >  k{  1  —  1  /n),  this  probability  is 
at  most 

e-5s(9-fc(l-l/n))2  . 

For  a  constant  n,  let  c  be  any  constant  with  1  —  1/n  <  c  <  1.  If  we  use  q  =  ck  queries,  the  error 
probability  is  less  than 

e-^(fc(c+l/n-l))2  =  (c+l/n-1)2^ 

which  is  exponentially  small  in  k.  Thus,  for  constant  n,  and  any  constant  c  with  1  —  1/n  <  c  <  1, 
using  q  =  ck  quantum  queries,  we  can  determine  k  input/output  pairs  with  overwhelming  probability. 
This  is  in  contrast  to  the  classical  case,  where  with  any  constant  fraction  of  k  queries,  we  can 
only  produce  k  input/output  pairs  with  negligible  probability.  As  an  example,  if  H  outputs  two 
bits,  it  is  possible  to  produce  k  input/output  pairs  of  of  H  using  only  q  =  0.8 A:  quantum  queries. 
However,  with  0.8 A:  classical  queries,  we  can  output  k  input /output  pairs  with  probability  at  most 

4— 0.2fc  <  Q  76k 

5  Quantum- Accessible  MACs 

Using  Theorem  4.1  we  can  now  show  that  a  quantum  secure  pseudorandom  function  [Zhal2b]  gives 
rise  to  the  quantum-secure  MAC,  namely  S(k,m )  =  PRF(A:,m).  We  prove  that  this  mac  is  secure. 

Theorem  5.1.  If  PRF  :  fC  x  X  — >•  y  is  a  quantum- secure  pseudorandom  function  and  1/|T|  is 
negligible,  then  S(k,m)  =  PRF(k,m)  is  a  EUF-qCMA-secure  MAC. 

Proof.  Let  A  be  a  polynomial  time  adversary  that  makes  q  quantum  queries  to  S(k,  •)  and  produces 
q  +  1  valid  input/output  pairs  with  probability  e.  Let  Game  0  be  the  standard  quantum  MAC 
attack  game,  where  A  makes  q  quantum  queries  to  MACk •  By  definition,  A’s  success  probability  in 
this  game  is  e. 

Let  Game  1  be  the  same  as  Game  0,  except  that  S(k,  •)  is  replaced  with  a  truly  random  function 
O  :  X  — >  y,  and  define  A’s  success  probability  as  the  probability  that  A  outputs  q  +  1  input/output 
pairs  of  O.  Since  PRF  is  a  quantum-secure  PRF,  A’s  advantage  in  distinguishing  Game  0  from 
Game  1  is  negligible. 

Now,  in  Game  1,  A  makes  q  quantum  queries  to  a  random  oracle,  and  tries  to  produce  q  +  1 
input/output  pairs.  However,  by  Theorem  4.1  and  Eq.  (4.1)  we  know  that  A’s  success  probability  is 
bounded  by  (q  +  1)/|T|  which  is  negligible.  It  now  follows  that  e  is  negligible  and  therefore,  S'  is  a 
EUF-qCMA-secure  MAC.  □ 
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5.1  Carter- Wegman  MACs 

In  this  section,  we  show  how  to  modify  the  Carter- Wegman  MAC  so  that  it  is  secure  in  the  quantum 
setting  presented  in  Section  2.2.  Recall  that  H  is  an  XOR- universal  family  of  hash  functions  from 
X  into  y  if  for  any  two  distinct  points  x  and  y,  and  any  constant  c  £  y, 

Pr  :[H(x)  -  H(y)  =  c]  =  l/\y\ 

h^H 

The  Carter- Wegman  construction  uses  a  pseudorandom  function  family  PRF  with  domain  X  and 
range  y,  and  an  XOR- universal  family  of  hash  functions  TL  from  A4  to  y.  The  key  is  a  pair  (k,  H), 
where  k  is  a  key  for  PRF  and  H  is  a  function  drawn  from  TL.  To  sign  a  message,  pick  a  random 
r  £  X,  and  return  (r,  PRF(fe,r)  +  HQn)). 

This  MAC  is  not,  in  general,  secure  in  the  quantum  setting  presented  in  Section  2.2.  The  reason 
is  that  the  same  randomness  is  used  in  all  slots  of  a  quantum  chosen  message  query,  that  is  the 
signing  oracle  computes: 

Y  Oim\m)  — *•  Y  am\m ,  r,  PRF (k,  r)  +  H{m)) 

m  m 

where  the  same  r  is  used  for  all  classical  states  of  the  superposition.  For  example,  suppose  TL  is 
the  set  of  functions  H(x )  =  ax  +  b  for  random  a  and  b.  With  even  a  single  quantum  query,  the 
adversary  will  be  able  to  obtain  a  and  PRF(fc,  r)  +  b  with  high  probability,  using  the  algorithm  from 
Theorem  6.2  in  Section  6.  Knowing  both  of  these  will  allow  the  adversary  to  forge  any  message. 

We  show  how  to  modify  the  standard  Carter- Wegman  MAC  to  make  it  secure  in  the  quantum 
setting. 

Construction  1  (Quantum  Carter- Wegman) .  The  Quantum  Carter-Wegman  MAC  (QCW-MAC) 
is  built  from  a  pseudorandom  function  PRF,  an  XOR-universal  set  of  functions  TL,  and  a  pairwise 
independent  set  of  functions  TZ. 

Keys:  The  secret  key  for  QCW-MAC  is  a  pair  ( k ,  H),  where  k  is  a  key  for  PRF  and  H  :  M.  — >  y  is 
drawn  from  TL 

Signing:  To  sign  a  message  m  choose  a  random  R  £  TZ  and  output  the  pair  ( R(m ),  PRF(fc,R(m))  + 
H(m)  )  as  the  tag.  When  responding  to  a  quantum  chosen  message  query,  the  same  R  is  used 
in  all  classical  states  of  the  superposition. 

Verification:  To  verify  that  (r,  s )  is  a  valid  tag  for  rn,  accept  iff  PRF(fc,  r)  +  H(m )  =  s. 

Theorem  5.2.  The  Quantum  Carter-Wegman  MAC  is  a  EUF-qCMA  secure  MAC. 

Proof.  We  start  with  an  adversary  A  that  makes  q  tag  queries,  and  then  produces  q  +  1  valid 
message/tag  pairs  with  probability  e.  We  now  adapt  the  classical  Carter-Wegman  security  proof  to 
our  MAC  in  the  quantum  setting. 

When  the  adversary  makes  query  i  on  the  superposition 

Y  am,y,z\m,y,z )  , 

m,y  ,z 

the  challenger  responds  with  the  superposition 

am,v,z\m>y  +  si(m)>z) 

m,  y,z 
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where  Sj(m)  =  ( Rj(m ),  PRF(/c,  (Rj(m))  +  H(m))  for  a  randomly  chosen  Rj  e  1Z,  where  7?.  is  a 
pairwise  independent  set  of  functions. 

The  adversary  then  creates  q+1  triples  (rrij ,  r-j ,  sy )  which,  with  probability  e,  are  valid  message/tag 
tuples.  That  means  H(rrij)  +  PRF(fc,rj)  =  sj  for  all  j. 

We  now  prove  that  e  must  be  small  using  a  sequence  of  games: 

Game  0:  Run  the  standard  MAC  game,  responding  to  query  i  with  the  oracle  that  maps  m  to 
( Ri(m ),  PRF(/c,  Rj(m))  +  H(m)),  where  Rj  is  a  random  function  from  7 Z.  The  advantage  of  A  in  this 
game  is  the  probability  is  produces  q  +  1  forgeries.  Denote  this  advantage  as  eo,  which  is  equal  to  e. 

Game  1:  Replace  PRF(fc,  •)  with  a  truly  random  function  F,  and  denote  the  advantage  in  this 
game  as  ei.  Since  PRF  is  a  quantum-secure  PRF,  e\  is  negligibly  close  to  eo- 

Game  2:  Next  we  change  the  goal  of  the  adversary.  The  adversary  is  now  asked  to  produce 
a  triple  (mo,  mi,  s)  where  H(mo)  —  H(m i)  =  s.  Given  an  adversary  A  for  Game  1,  we  construct 
an  adversary  B  for  Game  2  as  follows:  run  A,  obtaining  q  +  1  forgeries  (rrij,rj,Sj)  such  that 
H(rrij)  +  F(rj)  =  Sj  with  probability  e\.  If  all  rj  are  distinct,  abort.  Otherwise,  assume  without 
loss  of  generality  that  ro  =  n.  Then 

H(m0 )  -  H(mi)  =  (s0  -  F(r0))  -  (si  -  F(n))  =  s0  -  si 

so  output  (mo,  mi,  so  —  si).  Let  £2  be  the  advantage  of  B  in  this  game.  Let  p  be  the  probability 
that  all  r 'j  are  distinct  and  A  succeeds.  Then  £2  >  ei  —  p. 

We  wish  to  bound  p.  Define  a  new  algorithm  C.  with  oracle  access  to  F,  that  first  generates  FI, 
and  then  runs  A,  playing  the  role  of  challenger  to  A.  When  A  outputs  q  +  1  triples  (rrij ,  rj ,  Sj),  B 
outputs  g+l  pairs  (rj,  Sj  —  H(rrij )).  If  A  succeeded,  then  H(rrij)  +  F(rj )  =  Sj,  so  F(rj)  =  Sj  —  H^rrij), 
meaning  the  pairs  C  outputs  are  all  input/output  pairs  of  F.  If  all  the  rj  are  distinct,  then  C  will 
output  g+l  input/output  pairs,  which  is  impossible  except  with  probability  at  most  (q  +  1)/|T|- 
Therefore,  p  <  (q  +  1)/|T|-  Therefore,  as  long  as  |T|  is  super-polynomial  in  size,  p  is  negligible, 
meaning  C2  is  negligibly  close  to  ei. 

Game  3:  Now  modify  the  game  so  that  we  draw  Rj  uniformly  at  random  from  the  set  of  all 
oracles.  Notice  that  each  Rj  is  queried  only  once,  meaning  pairwise-independent  Rj  look  exactly 
like  truly  random  Rj,  so  Game  3  looks  exactly  like  Game  2  from  the  point  of  view  of  the  adversary. 
Thus  the  success  probability  €3  is  equal  to  €2- 

Game  4:  For  this  game,  we  answer  query  i  with  the  oracle  that  maps  m  to  (Rj(m) ,  F (Rj(m)) . 
That  is,  we  ignore  H  for  answering  MAC  queries.  Let  £3  be  the  success  probability  in  this  game. 
To  prove  that  €4  is  negligibly  close  to  £3,  we  need  the  following  lemma: 

Lemma  5.3.  Consider  two  distributions  D\  and  D2  on  oracles  from  M.  into  X  x  y : 

•  D\ :  generate  a  random  oracle  R  :  M.  -+  X  and  a  random  oracle  P  :  M.  — >  y,  and  output  the 
oracle  that  maps  m  to  (R(m),  P(m)). 

•  D2:  generate  a  random  oracle  R  :  A4  — >  X  and  a  random  oracle  F  :  X  — )■  y,  and  output  the 
oracle  that  maps  m  to  (R(m) ,  F (R(m))) . 

Then  the  probability  that  any  q-quantum  query  algorithm  distinguishes  D\  from  D2  is  at  most 

0(q2/ 1*|1/3). 

Proof.  Let  B  be  a  quantum  algorithm  making  quantum  queries  that  distinguishes  with  probability 
A.  We  will  now  define  a  quantum  algorithm  C  that  is  given  r  samples  (sj,tj)  G  X  x  y,  where  Sj  are 
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chosen  randomly,  and  L  are  either  chosen  randomly,  or  are  equal  to  T{sj)  for  a  randomly  chosen 
function  T  :  X  y.  C s  goal  is  to  distinguish  these  two  cases.  Notice  that  as  long  as  the  Si  are 
distinct,  these  two  distributions  are  identical.  Therefore,  C s  distinguishing  probability  is  at  most 
the  probability  of  a  collision,  which  is  at  most  0(r2 /\X\). 

C  works  as  follows:  generate  a  random  oracle  A  :  M.  —t  [r].  Let  R(m )  =  SA(m)  and  P(m)  =  t^m^, 
and  give  B  the  oracle  P(m)).  If  ti  are  random,  then  we  have  the  oracle  that  maps 

m  to  (. SMm)itA{m ))■  This  is  exactly  the  small-range  distribution  of  Zhandry  [Zhal2b],  and  is 
indistinguishable  from  D\  except  with  probability  0(g3/r). 

Similarly,  if  ti  =  T(sj),  then  the  oracle  maps  m  to  (sx(ra),T(s, 4(m)))-  The  oracle  that  maps  m 
to  sj is  also  a  small-range  distribution,  so  it  is  indistinguishable  from  a  random  oracle  except 
with  probability  0(q3/r).  If  we  replace  SMm)  with  a  random  oracle,  we  get  exactly  the  distribution 
L>2-  Thus,  D‘2  is  indistinguishable  from  (sA(m)> T(sA(m)))  except  with  probability  0(g3/r). 

Therefore,  C’s  success  probability  at  distinguishing  D\  from  D2  is  at  least  A  —  0(g3/r),  and  is 
at  most  0(r2 /\X\).  This  means  the  distinguishing  probability  of  B  is  at  most 


This  is  minimized  by  choosing  r  =  0(q |  Af  | 1//3) ,  which  gives  a  distinguishing  probability  of  at 
most  0(q2 /\X\l!z). 

□ 

We  show  that  €4  is  negligibly-close  to  63  using  a  sequence  of  sub-games.  Game  3a  is  the  game 
where  we  answer  query  i  with  the  oracle  that  maps  m  to  ( Ri(m ),  Pi(m )  +  H(m))  where  Pi  is  another 
random  oracle.  Notice  that  we  can  define  oracles  R{i,m )  =  Ri(m)  and  P(i,m )  =  Pi(m).  Then  R 
and  P  are  random  oracles,  and  using  the  above  lemma,  the  success  probability  of  B  in  Game  3a  is 
negligibly  close  to  that  of  Game  3.  Notice  that  since  Pi  is  random,  P[ (rri)  =  Pi(m)  4-  H(m)  is  also 
random,  so  Game  3a  is  equivalent  to  the  game  were  we  answer  query  i  with  the  oracle  that  maps 
m  to  ( Ri(m ),  Pi(m )).  Using  the  above  lemma  again,  the  success  probability  of  B  in  this  game  is 
negligibly  close  to  that  of  Game  4. 

Now,  we  claim  that  £4,  the  success  probability  in  Game  4  is  negligible.  Indeed,  the  view  of  B  is 
independent  of  H ,  so  the  probability  that  H(rrio)  —  H(m\)  =  s  is  1/|T|-  Since  £4  is  negligibly  close 
to  £  =  £0,  the  advantage  of  A,  A’s  advantage  is  also  negligible.  □ 

6  q-time  MACs 

In  this  section,  we  develop  quantum  one-time  MACs,  MACs  that  are  secure  when  the  adversary 
can  issue  only  one  quantum  chosen  message  query.  More  generally,  we  will  study  quantum  g-time 
MACs. 

Classically,  any  pairwise  independent  function  is  a  one-time  MAC.  In  the  quantum  setting, 
Corollary  3.4  shows  that  when  the  range  is  much  larger  than  the  domain,  this  still  holds.  However, 
such  MACs  are  not  useful  since  we  want  the  tag  to  be  short.  We  first  show  that  when  the  range  is 
not  larger  than  the  domain,  pairwise  independence  is  not  enough  to  ensure  security: 

Theorem  6.1.  For  any  set  y  of  prime-power  size,  and  any  set  X  with  \X\  >  |T|,  there  exist 
(g  +  1  )-wise  independent  functions  from  X  to  y  that  are  not  q-time  MACs. 
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To  prove  this  theorem,  we  treat  y  as  a  finite  field,  and  assume  X  =  y,  as  our  results  are  easy  to 


generalize  to  larger  domains.  We  use  random  degree  q  polynomials  as  our  (q  +  l)-wise  independent 
family,  and  show  in  Theorem  6.2  below  that  such  polynomials  can  be  completely  recovered  using 


only  q  quantum  queries.  It  follows  that  the  derived  MAC  cannot  be  g-time  secure  since  once  the 
adversary  has  the  polynomial  it  can  easily  forge  tags  on  new  messages. 

Theorem  6.2.  For  any  prime  power  n,  there  is  an  efficient  quantum  algorithm  that  makes  only 
q  quantum  queries  to  an  oracle  implementing  a  degree-q  polynomial  F  :  Fn  — >•  Fn,  and  completely 
determines  F  with  probability  1  —  0(qn -1). 

The  theorem  shows  that  a  (q  +  l)-wise  independence  family  is  not  necessarily  a  secure  quantum  q- 
time  MAC  since  after  q  quantum  chosen  message  queries  the  adversary  extracts  the  entire  secret  key. 
The  case  q  =  1  is  particularly  interesting.  The  following  lemma  will  be  used  to  prove  Theorem  6.2: 

Lemma  6.3.  For  any  prime  power  n,  and  any  subset  X  C  Fn  of  size  n  —  k,  there  is  an  efficient 
quantum  algorithm  that  makes  a  single  quantum  query  to  any  degree- 1  polynomial  F  :  X  — >•  Fn,  and 
completely  determines  F  with  probability  1  —  Ofkn~l). 

Proof.  Write  F(x)  =  ax  +  b  for  values  a,  b  £  Fn,  and  write  n  =  pt  for  some  prime  p  and  integer  t. 
We  design  an  algorithm  to  recover  a  and  b. 

Initialize  the  quantum  registers  to  the  state 


Next,  make  a  single  oracle  query  to  F,  obtaining 


Note  that  we  can  interpret  elements  z  £  Fn  as  vectors  z  £  Fp.  Let  (y,  z)  be  the  inner  product  of 
vectors  y,  z  £  Fp.  Multiplication  by  a  in  Fn  is  a  linear  transformation  over  the  vector  space  Fp,  and 
can  therefore  be  represented  by  a  matrix  Ma  £  F*xt.  Thus,  we  can  write 


Note  that  in  the  case  t  =  1,  a  is  a  scalar  in  Fp,  so  Ma  is  just  the  scalar  a. 
Now,  the  algorithm  applies  the  Fourier  transform  to  both  registers,  to  obtain 


where  up  is  a  complex  primitive  pth  root  of  unity. 
The  term  in  parenthesis  can  be  written  as 
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We  will  then  do  a  change  of  variables,  setting  y'  =  y  +  M^z. 
Therefore,  we  can  write  the  state  as 


|V>3>  = 


\xe* 


L 0, 


(x,y'>  lw{b,z) 


ly'-Mjz, 


For  z  /  0  and  y'  =  0,  we  will  now  explain  how  to  recover  a  from  (— M^z,  z).  Notice  that  the 
transformation  that  takes  a  and  outputs  — M^z  is  a  linear  transformation.  Call  this  transformation 
L2.  The  coefficients  of  L2  are  easily  computable,  given  z,  by  applying  the  transformation  to  each  of 
the  unit  vectors.  Notice  that  if  t  =  1,  L2  is  just  the  scalar  —z.  We  claim  that  L2  is  invertible  if  z  /  0. 
Suppose  there  is  some  a  such  that  L2a  =  — M^z  =  0.  Since  z  /  0,  this  means  the  linear  operator 
— is  not  invertible,  so  neither  is  —  Ma.  But  — Ma  is  just  multiplication  by  —a  in  the  field  Fn. 
This  multiplication  is  only  non- invertible  if  —  a  =  0,  meaning  a  =  0,  a  contradiction.  Therefore,  the 
kernel  of  L2  is  just  0,  so  the  map  is  invertible. 

Therefore,  to  compute  a,  compute  the  inverse  operator  Ly1  and  apply  it  to  — M^z,  interpreting 
the  result  as  a  field  element  in  Fn.  The  result  is  a.  More  specifically,  for  z  /  0,  apply  the  computation 
mapping  (y,  z)  to  (L“1y,z),  which  will  take  (— M^z,z)  to  (a,  z).  For  z  =  0,  we  will  just  apply  the 
identity  map,  leaving  both  registers  as  is.  This  map  is  now  reversible,  meaning  this  computation 
can  be  implemented  as  a  quantum  computation.  The  result  is  the  state 


IV>4>  = 


(x.y'> 


(b,z) 


T  —  1  /  i 

L2  y  +a, 


+  |y',o> 


We  will  now  get  rid  of  the  |y7,  0)  terms  by  measuring  whether  z  =  0.  The  probability  that  z  =  0 
is  1/n,  and  in  this  case,  we  abort.  Otherwise,  we  are  left  if  the  state 


IV>5> 


1 

\/n(n  —  l)(n 


k) 


E  ( E  4xy> 

z^0,y'  VxeA” 


y'  +  a,  z) 


The  algorithm  then  measures  the  first  register.  Recall  that  X  has  size  n  —  k.  The  probability 
the  outcome  of  the  measurement  is  a  is  then  (1  —  k/n ).  In  this  case,  we  are  left  in  the  state 


IV’e) 


1 

\Jn—  1 


E“f'z,N> 

z^O 


Next,  the  algorithm  performs  the  inverse  Fourier  transform  to  the  second  register,  arriving  at 
the  state 

1 


l^7>  =  7  =  .  J2  E 

Vn(n  -  !)  w  Iz/o 


(b— w,z) 


W 


Now  the  algorithm  measures  again,  and  interpret  the  resulting  vector  as  a  field  element.  The 
probability  that  the  result  is  b  is  1  —  1/n.  Therefore,  with  probability  (l  —  /c/n)(l  — 1/n)2  =  1 —0(k/n), 
the  algorithm  outputs  both  a  and  b. 

□ 
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Now  we  use  this  attack  to  obtain  an  attack  on  degree-d  polynomials,  for  general  d: 

Proof  of  Theorem  6.2.  We  show  how  to  recover  the  q  +  1  different  coefficients  of  any  degree- g 
polynomial,  using  only  q  —  1  classical  queries  and  a  single  quantum  query. 

Let  a  be  the  coefficient  of  xq ,  and  b  the  coefficient  of  x9-1  in  F(x).  First,  make  q  —  1  classical 
queries  to  arbitrary  distinct  points  {xi,  ...,  x?_i}.  Let  Z(x)  be  the  unique  polynomial  of  degree  q  —  2 
such  that  r(xf)  =  F(xf),  using  standard  interpolation  techniques.  Let  G(x)  =  F(x)  —  Z(x).  G(x)  is 
a  polynomial  of  degree  q  that  is  zero  on  the  x*,  so  it  factors,  allowing  us  to  write 

q-l 

F(x)  =  Z(x)  +  (ax  +  b')  ]^[ (x  —  Xi) 

Z— 1 

By  expanding  the  product,  we  see  that  a  =  a'  and  b  =  b'  —  ajf  xi-  Therefore,  we  can  implement  an 
oracle  mapping  x  to  a(x  +  J2  xi)  F  b  as  follows: 

•  Query  F  on  x,  obtaining  F(x). 

•  Compute  Z(x),  and  let  G(x)  =  F(x)  —  Z(x). 

•  Output  G(x) /  JIC®  ~  xi)  =  a(x  +  xi)  +  b- 

This  oracle  works  on  all  inputs  except  the  q  —  1  different  Xi  values.  We  run  the  algorithm  from 
Lemma  6.3  on  X  =  Fn  \  {x*},  we  will  recover  with  probability  1  —  0(q/n )  both  a  and  b  +  ajfxi 
using  a  single  quantum  query,  from  which  we  can  compute  a  and  b.  Along  with  the  F(xi )  values, 
we  can  then  reconstruct  the  entire  polynomial.  □ 

6.1  Sufficient  Conditions  for  a  One-Time  Mac 

We  show  that,  while  pairwise  independence  is  not  enough  for  a  one-time  MAC,  4- wise  independence 
is.  We  first  generalize  a  theorem  of  Zhandry  [Zhal2a]: 

Lemma  6.4.  Let  A  be  any  quantum  algorithm  that  makes  c  classical  queries  and  q  quantum  queries 
to  an  oracle  H .  If  H  is  drawn  from  a  ( c+2q)-wise  independent  function,  then  the  output  distribution 
of  A  is  identical  to  the  case  where  H  is  truly  random. 

Proof.  If  q  =  0,  then  this  theorem  is  trivial,  since  the  c  outputs  A  sees  are  distributed  randomly. 
If  c  =  0,  then  the  theorem  reduces  to  that  of  Zhandry  [Zhal2a].  By  adapting  the  proof  of  the  c  =  0 
case  to  the  general  case,  we  get  the  lemma.  Our  approach  is  similar  to  the  polynomial  method,  but 
needs  to  be  adapted  to  handle  classical  queries  correctly. 

Our  quantum  algorithm  makes  k  =  c  +  q  queries.  Let  Q  C  [k]  be  the  set  of  queries  that  are 
quantum,  and  let  C  C  [k]  be  the  set  of  queries  that  are  classical. 

Fix  an  oracle  H.  Let  SX:V  be  1  if  H(x)  =  y  and  0  otherwise.  Let  p ^  be  the  density  matrix  after 
the  ith  query,  and  pb^1/2)  be  the  density  matrix  before  the  zth  query.  p0+!/2)  js  the  final  state  of 
the  algorithm. 

We  now  claim  that  pb)  and  pb+V2)  are  polynomials  of  the  8Xty  of  degree  ki,  where  ki  is  twice 
the  number  of  quantum  queries  made  so  far,  plus  the  number  of  classical  queries  made  so  far. 

P'O  and  pO+i/2)  are  independent  of  H,  so  they  are  not  a  function  of  the  5x,y  at  all,  meaning 
the  degree  is  0  =  ko. 

We  now  inductively  assume  our  claim  is  true  for  i  —  1,  and  express  pb)  in  terms  of  pb-1/2). 
There  are  two  cases: 
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•  z  is  a  quantum  query.  In  this  case,  ki  =  ki- \  +  2.  We  can  write 

(i)  =  (i- 1/2) 

r x,y,z,x'  ,y'  ,z'  rx,y— H(x),z,x' ,y’ — H(x'),z 

An  alternative  way  to  write  this  is  as 


P 


,W 

x,y,z,x'  ,y'  ,z' 


EA  A  («— 1/2) 

Ox,y-rOx’  ,y’ -r'  Px,r,z,x’  ,r'  ,z 

rr' 


By  induction,  each  of  the  px  r  J  J r,  z  are  polynomials  of  degree  ki- \  in  the  dx,y  values,  so 
(i) 

Px,y,z,x’,y',z'  is  a  polynomial  of  degree  i  +  2  =  ki. 

•  i  is  a  classical  query.  This  means  lt  =  kt-\  +  1.  Let  representing  the  state  after 

measuring  the  x  register,  but  before  making  the  actual  query.  This  is  identical  to 
except  the  entries  where  x  ^  x'  are  zeroed  out.  We  can  then  write 


r  x,y,z,x',y' 


rr' 


Now,  notice  that  bx<y-rbx<yi-ri  is  zero  unless  y  —  r  =  y1  —  r',  in  which  case  it  just  reduces  to 
dX)y-r.  Therefore,  we  can  simply  further: 

0{i)  =  V  S 

P x,y,z,x' ,y' ,z'  /  j  xtV  rP x,r,z,x,{y—y')+r,z 


By  induction,  each  of  the  p^r  1Jx>^y_y,^+r  z  values  are  polynomials  of  degree  ki- 1  in  the  dxyy 
(i) 

values,  so  pr  z  x,  ,  z,  is  a  polynomial  of  degree  ki- \  +  1  =  ki 

Therefore,  after  all  q  queries,  final  matrix  p^+i/2)  js  a  polynomial  in  the  8x,y  of  degree  at  most 
k  =  2q  +  c.  We  can  then  write  the  density  matrix  as 


p(g+i/2)  =  Mx  y  5x.  y. 

X,y  t= 0 

where  x  and  y  are  vectors  of  length  k,  Mx  y  are  matrices,  and  the  sum  is  over  all  possible  vectors. 

Now,  fix  a  distribution  D  on  oracles  H.  The  density  matrix  for  the  final  state  of  the  algorithm, 
when  the  oracle  is  drawn  from  H,  is  given  by 

x,y  V  H  t= 0  / 

The  term  in  parenthesis  evaluates  to  Pr h<-d[H{x)  =  y].  Therefore,  the  final  density  matrix  can  be 
expressed  as 

E  M*.y  = 

Since  x  and  y  are  vectors  of  length  k  =  2q  +  c,  if  D  is  fc-wise  independent,  Pr h<-d [i?(x)  =  y] 
evaluates  to  the  same  quantity  as  if  D  was  truly  random.  Thus  the  density  matrices  are  the  same. 
Since  all  of  the  statistical  information  about  the  final  state  of  the  algorithm  is  contained  in  the 
density  matrix,  the  distributions  of  outputs  are  thus  identical,  completing  the  proof. 

□ 
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Using  this  lemma  we  show  that  (3 q  +  l)-wise  independence  is  sufficient  for  g-time  MACs. 

Theorem  6.5.  Any  (3 q  +  1  )-wise  independent  family  with  domain  X  and  range  y  is  a  quantum 
q-time  secure  MAC  provided  (q  +  1)/|T|  is  negligible. 

Proof.  Let  D  be  some  (3 q  +  l)-wise  independent  function.  Suppose  we  have  an  adversary  A  that 
makes  q  quantum  queries  to  an  oracle  H,  and  attempts  to  produces  q  +  1  input/output  pairs.  Let 
cr  be  the  probability  of  success  when  H  is  a  random  oracle,  and  let  cr  be  the  probability  of  success 
when  H  is  drawn  from  D.  We  construct  an  algorithm  B  with  access  to  H  as  follows:  simulate  A 
with  oracle  access  to  H.  When  A  outputs  q+  1  input/output  pairs,  simply  make  <7+1  queries  to  H 
to  check  that  these  are  valid  pairs.  Output  1  if  and  only  if  all  pairs  are  valid.  Therefore,  B  makes 
q  quantum  queries  and  c  =  q  +  1  classical  queries  to  H,  and  outputs  1  if  and  only  if  A  succeeds: 
if  H  is  random,  B  outputs  1  with  probability  cr,  and  if  H  is  drawn  from  D,  B  outputs  1  with 
probability  cr.  Now,  since  D  is  (3 q  +  l)-wise  independent  and  3q  +  1  =  2q  +  c,  Lemma  6.4  shows 
that  the  distributions  of  outputs  when  H  is  drawn  from  D  is  identical  to  that  when  H  is  random, 
meaning  cr  =  €r. 

Thus,  when  H  is  drawn  from  D ,  A’s  succeeds  with  the  same  probability  that  it  would  if  H 
was  random.  But  we  already  know  that  if  H  is  truly  random,  A’s  success  probability  is  less  than 
(q  +  1)/|T|-  Therefore,  when  H  is  drawn  from  D,  A  succeeds  with  probability  less  than  (q  +  1)/|T|, 
which  is  negligible.  Hence,  if  H  is  drawn  from  D,  H  is  a  g-time  MAC.  □  □ 

7  Conclusion 

We  introduced  the  rank  method  as  a  general  technique  for  obtaining  lower  bounds  on  quantum  oracle 
algorithms  and  used  this  method  to  bound  the  probability  that  a  quantum  algorithm  can  evaluate 
a  random  oracle  O  :  X  — >•  y  at  k  points  using  q  <  k  queries.  When  the  range  of  y  is  small,  say 
|T|  =  8,  a  quantum  algorithm  can  recover  k  points  of  O  from  only  0.9 k  queries  with  high  probability. 
However,  we  show  that  when  the  range  y  is  large,  no  algorithm  can  produce  k  input-output  pairs  of 
O  using  only  k  —  1  queries,  with  non- negligible  probability.  We  use  these  bounds  to  construct  the  first 
MACs  secure  against  quantum  chosen  message  attacks.  We  consider  both  PRF  and  Carter- Wegman 
constructions.  For  one-time  MACs  we  showed  that  pair-wise  independence  does  not  ensure  security, 
but  four- way  independence  does. 

These  results  suggest  many  directions  for  future  work.  First,  can  these  bounds  be  generalized  to 
signatures  to  obtain  signatures  secure  against  quantum  chosen  message  attacks?  Similarly,  can  we 
construct  encryption  systems  secure  against  quantum  chosen  ciphertext  attacks  where  decryption 
queries  are  superpositions  of  ciphertexts? 
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Abstract 

Proofs  of  retrievability  allow  a  client  to  store  her  data  on  a  remote  server  (e.g.,  “in  the  cloud”)  and 
periodically  execute  an  efficient  audit  protocol  to  check  that  all  of  the  data  is  being  maintained  correctly 
and  can  be  recovered  from  the  server.  For  efficiency,  the  computation  and  communication  of  the  server 
and  client  during  an  audit  protocol  should  be  significantly  smaller  than  reading/transmitting  the  data  in 
its  entirety.  Although  the  server  is  only  asked  to  access  a  few  locations  of  its  storage  during  an  audit,  it 
must  maintain  full  knowledge  of  all  client  data  to  be  able  to  pass. 

Starting  with  the  work  of  Juels  and  Kaliski  (CCS  ’07),  all  prior  solutions  to  this  problem  crucially 
assume  that  the  client  data  is  static  and  do  not  allow  it  to  be  efficiently  updated.  Indeed,  they  all  store  a 
redundant  encoding  of  the  data  on  the  server,  so  that  the  server  must  delete  a  large  fraction  of  its  storage 
to  ‘lose’  any  actual  content.  Unfortunately,  this  means  that  even  a  single  bit  modification  to  the  original 
data  will  need  to  modify  a  large  fraction  of  the  server  storage,  which  makes  updates  highly  inefficient. 
Overcoming  this  limitation  was  left  as  the  main  open  problem  by  all  prior  works. 

In  this  work,  we  give  the  first  solution  providing  proofs  of  retrievability  for  dynamic  storage,  where  the 
client  can  perform  arbitrary  reads/writes  on  any  location  within  her  data  by  running  an  efficient  protocol 
with  the  server.  At  any  point  in  time,  the  client  can  execute  an  efficient  audit  protocol  to  ensure  that  the 
server  maintains  the  latest  version  of  the  client  data.  The  computation  and  communication  complexity  of 
the  server  and  client  in  our  protocols  is  only  polylogarithmic  in  the  size  of  the  client’s  data.  The  starting 
point  of  our  solution  is  to  split  up  the  data  into  small  blocks  and  redundantly  encode  each  block  of  data 
individually,  so  that  an  update  inside  any  data  block  only  affects  a  few  codeword  symbols.  The  main 
difficulty  is  to  prevent  the  server  from  identifying  and  deleting  too  many  codeword  symbols  belonging 
to  any  single  data  block.  We  do  so  by  hiding  where  the  various  codeword  symbols  for  any  individual 
data  block  are  stored  on  the  server  and  when  they  are  being  accessed  by  the  client,  using  the  algorithmic 
techniques  of  oblivious  RAM. 


*IBM  Research,  T.J.  Watson.  Hawthorne,  NY,  USA.  cdc@gatech.edu 

^Koij  University.  Istanbul,  TURKEY,  akupcu@ku.edu.tr 

UBM  Research,  T.J.  Watson.  Hawthorne,  NY,  USA.  wichs@cs  . nyu . edu 


9.  Dynamic  Proofs  of  Retrievability  via  Oblivious  RAM 


1  Introduction 


Cloud  storage  systems  (Amazon  S3,  Dropbox,  Google  Drive  etc.)  are  becoming  increasingly  popular  as  a 
means  of  storing  data  reliably  and  making  it  easily  accessible  from  any  location.  Unfortunately,  even  though 
the  remote  storage  provider  may  not  be  trusted,  current  systems  provide  few  security  or  integrity  guarantees. 

Guaranteeing  the  privacy  and  authenticity  of  remotely  stored  data  while  allowing  efficient  access  and 
updates  is  non-trivial,  and  relates  to  the  study  of  oblivious  RAMs  and  memory  checking ,  which  we  will 
return  to  later.  The  main  focus  of  this  work,  however,  is  an  orthogonal  question:  How  can  we  efficiently 
verify  that  the  entire  client  data  is  being  stored  on  the  remote  server  in  the  first  place?  In  other  words,  what 
prevents  the  server  from  deleting  some  portion  of  the  data  (say,  an  infrequently  accessed  sector)  to  save  on 
storage? 

Provable  Storage.  Motivated  by  the  questions  above,  there  has  been  much  cryptography  and  security 
research  in  creating  a  provable  storage  mechanism,  where  an  untrusted  server  can  prove  to  a  client  that  her 
data  is  kept  intact.  More  precisely,  the  client  can  run  an  efficient  audit  protocol  with  the  untrusted  server, 
guaranteeing  that  the  server  can  only  pass  the  audit  if  it  maintains  full  knowledge  of  the  entire  client  data. 
This  is  formalized  by  requiring  that  the  data  can  be  efficiently  extracted  from  the  server  given  its  state  at 
the  beginning  of  any  successful  audit.  One  may  think  of  this  as  analogous  to  the  notion  of  extractors  in  the 
definition  of  zero-knowledge  proofs  of  knowledge  [14,  4], 

One  trivial  audit  mechanism,  which  accomplishes  the  above,  is  for  the  client  to  simply  download  all  of 
her  data  from  the  server  and  check  its  authenticity  (e.g.,  using  a  MAC).  However,  for  the  sake  of  efficiency, 
we  insist  that  the  computation  and  communication  of  the  server  and  client  during  an  audit  protocol  is  much 
smaller  than  the  potentially  huge  size  of  the  client’s  data.  In  particular,  the  server  shouldn’t  even  have  to 
read  all  of  the  client’s  data  to  run  the  audit  protocol,  let  alone  transmit  it.  A  scheme  that  accomplishes  the 
above  is  called  a  Proof  of  Retriev ability  (PoR). 

Prior  Techniques.  The  first  PoR  schemes  were  defined  and  constructed  by  Juels  and  Kaliski  [19],  and 
have  since  received  much  attention.  We  review  the  prior  work  and  and  closely  related  primitives  (e.g., 
sublinear  authenticators  [23]  and  provable  data  possession  [1])  in  Section  1.2. 

On  a  very  high  level,  all  PoR  constructions  share  essentially  the  same  common  structure.  The  client 
stores  some  redundant  encoding  of  her  data  under  an  erasure  code  on  the  server,  ensuring  that  the  server 
must  delete  a  significant  fraction  of  the  encoding  before  losing  any  actual  data.  During  an  audit,  the  client 
then  checks  a  few  random  locations  of  the  encoding,  so  that  a  server  who  deleted  a  significant  fraction  will 
get  caught  with  overwhelming  probability. 

More  precisely,  let  us  model  the  client’s  input  data  as  a  string  M  £  consisting  of  l  symbols  from 
some  small  alphabet  E,  and  let  Enc  :  Tf  — >  Yw  denote  an  erasure  code  that  can  correct  the  erasure  of  up 
to  |  of  its  output  symbols.  The  client  stores  Enc(M)  on  the  server.  During  an  audit,  the  client  selects 
a  small  random  subset  of  t  out  of  the  £'  locations  in  the  encoding,  and  challenges  the  server  to  respond 
with  the  corresponding  values,  which  it  then  checks  for  authenticity  (e.g.,  using  MAC  tags).  Intuitively, 
if  the  server  deletes  more  than  half  of  the  values  in  the  encoding,  it  will  get  caught  with  overwhelming 
probability  >  1  —  2_t  during  the  audit,  and  otherwise  it  retains  knowledge  of  the  original  data  because  of 
the  redundancy  of  the  encoding.  The  complexity  of  the  audit  protocol  is  only  proportional  to  t  which  can 
be  set  to  the  security  parameter  and  is  independent  of  the  size  of  the  client  data.1 

Difficulty  of  Updates.  One  of  the  main  limitations  of  all  prior  PoR  schemes  is  that  they  do  not  support 
efficient  updates  to  the  client  data.  Under  the  above  template  for  PoR,  if  the  client  wants  to  modify  even  a 
single  location  of  M,  it  will  end  up  needing  to  change  the  values  of  at  least  half  of  the  locations  in  Enc(M) 

1Some  of  the  more  advanced  PoR  schemes  (e.g.,  [27,  10])  optimize  the  communication  complexity  of  the  audit  even  further 
by  cleverly  compressing  the  t  codeword  symbols  and  their  authentication  tags  in  the  server’s  response. 
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on  the  server,  requiring  a  large  amount  of  work  (linear  in  the  size  of  the  client  data).  Constructing  a  PoR 
scheme  that  allows  for  efficient  updates  was  stated  as  the  main  open  problem  by  Juels  and  Kaliski  [19].  We 
emphasize  that,  in  the  setting  of  updates,  the  audit  protocol  must  ensure  that  the  server  correctly  maintains 
knowledge  of  the  latest  version  of  the  client  data,  which  includes  all  of  the  changes  incurred  over  time. 
Before  we  describe  our  solution  to  this  problem,  let  us  build  some  intuition  about  the  challenges  involved 
by  examining  two  natural  but  flawed  proposals. 

First  Proposal.  A  natural  attempt  to  overcome  the  inefficiency  of  updating  a  huge  redundant  encoding 
is  to  encode  the  data  “locally”  so  that  a  change  to  one  position  of  the  data  only  affects  a  small  number 
of  codeword  symbols.  More  precisely,  instead  of  using  an  erasure  code  that  takes  all  i  data  symbols  as 
input,  we  can  use  a  code  Enc  :  Yfl  — >•  £n  that  works  on  small  blocks  of  only  k  <C  £  symbols  encoded  into 
n  symbols.  The  client  divides  the  data  M  into  L  =  t/k  message  blocks  (mi, . . . ,  m^),  where  each  block 
m,  E  consists  of  k  symbols.  The  client  redundantly  encodes  each  message  block  m,  individually  into 
a  corresponding  codeword  block  c ;  =  Enc(mj)  E  £n  using  the  above  code  with  small  inputs.  Finally  the 
client  concatenates  these  codeword  blocks  to  form  the  value  C  =  (ci, . . . ,  c i)  E  TiLn ,  which  it  stores  on  the 
server.  Auditing  works  as  before:  The  client  randomly  chooses  t  of  the  L  ■  n  locations  in  C  and  challenges 
the  server  to  respond  with  the  corresponding  codeword  symbols  in  these  locations,  which  it  then  tests  for 
authenticity.2  The  client  can  now  read/write  to  any  location  within  her  data  by  simply  reading/writing  to 
the  n  relevant  codeword  symbols  on  the  server. 

The  above  proposal  can  be  made  secure  when  the  block-size  k  (which  determines  the  complexity  of 
reads/updates)  and  the  number  of  challenged  locations  t  (which  determines  the  complexity  of  the  audit)  are 
both  set  to  0(\/f)  where  £  is  the  size  of  the  data  (see  Appendix  A  for  details).  This  way,  the  audit  is  likely  to 
check  sufficiently  many  values  in  each  codeword  block  c*.  Unfortunately,  if  we  want  a  truly  efficient  scheme 
and  set  n,  t  =  o(y/I)  to  be  small,  then  this  solution  becomes  completely  insecure.  The  server  can  delete  a 
single  codeword  block  c,;  from  C  entirely,  losing  the  corresponding  message  block  mn  but  still  maintain  a 
good  chance  of  passing  the  above  audit  as  long  as  none  of  the  t  random  challenge  locations  coincides  with 
the  n  deleted  symbols,  which  happens  with  good  probability. 

Second  Proposal.  The  first  proposal  (with  small  n,  t)  was  insecure  because  a  cheating  server  could  easily 
identify  the  locations  within  C  that  correspond  to  a  single  message  block  and  delete  exactly  the  codeword 
symbols  in  these  locations.  We  can  prevent  such  attacks  by  pseudo-randomly  permuting  the  locations  of  all 
of  the  different  codeword-symbols  of  different  codeword  blocks  together.  That  is,  the  client  starts  with  the 
value  C  =  (C[l], . . . ,  C [Ln])  =  (ci, . . . ,  cl)  E  TjLn  computed  as  in  the  first  proposal.  It  chooses  a  pseudo¬ 
random  permutation  7 r  :  [Ln]  -A  [Ln]  and  computes  the  permuted  value  C'  :=  (C[7r(l)], . . . ,  C[7r(Ln)]) 
which  it  then  stores  on  the  server  in  an  encrypted  form  (each  codeword  symbol  is  encrypted  separately). 
The  audit  still  checks  t  out  of  Ln  random  locations  of  the  server  storage  and  verifies  authenticity. 

It  may  seem  that  the  server  now  cannot  immediately  identify  and  selectively  delete  codeword-symbols 
belonging  to  a  single  codeword  block,  thwarting  the  attack  on  the  first  proposal.  Unfortunately,  this  mod¬ 
ification  only  re-gains  security  in  the  static  setting,  when  the  client  never  performs  any  operations  on  the 
data.3  Once  the  client  wants  to  update  some  location  of  M  that  falls  inside  some  message  block  m.;,  she 
has  to  reveal  to  the  server  where  all  of  the  n  codeword  symbols  corresponding  to  c *  =  Enc(m;)  reside  in 
its  storage  since  she  needs  to  update  exactly  these  values.  Therefore,  the  server  can  later  selectively  delete 
exactly  these  n  codeword  symbols,  leading  to  the  same  attack  as  in  the  first  proposal. 

Impossibility?  Given  the  above  failed  attempts,  it  may  even  seem  that  truly  efficient  updates  could  be 
inherently  incompatible  with  efficient  audits  in  PoR.  If  an  update  is  efficient  and  only  changes  a  small 

2This  requires  that  we  can  efficiently  check  the  authenticity  of  the  remotely  stored  data  C,  while  supporting  efficient  updates 
on  it.  This  problem  is  solved  by  memory  checking  (see  our  survey  of  related  work  in  Section  1.2). 

3A  variant  of  this  idea  was  actually  used  by  Juels  and  Kaliski  [19]  for  extra  efficiency  in  the  static  setting. 
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subset  of  the  server’s  storage,  then  the  server  can  always  just  ignore  the  update,  thereby  failing  to  maintain 
knowledge  of  the  latest  version  of  the  client  data.  All  of  the  prior  techniques  appear  ineffective  against 
such  attack.  More  generally,  any  audit  protocol  which  just  checks  a  small  subset  of  random  locations  of  the 
server’s  storage  is  unlikely  to  hit  any  of  the  locations  involved  in  the  update,  and  hence  will  not  detect  such 
cheating,  meaning  that  it  cannot  be  secure.  However,  this  does  not  rule  out  the  possibility  of  a  very  efficient 
solution  that  relies  on  a  more  clever  audit  protocol,  which  is  likelier  to  check  recently  updated  areas  of  the 
server’s  storage  and  therefore  detect  such  an  attack.  Indeed,  this  property  will  be  an  important  component 
in  our  actual  solution. 

1.1  Our  Results  and  Techniques 

Overview  of  Result.  In  this  work,  we  give  the  first  solution  to  dynamic  PoR  that  allows  for  efficient 
updates  to  client  data.  The  client  only  keeps  some  short  local  state,  and  can  execute  arbitrary  read/write 
operations  on  any  location  within  the  data  by  running  a  corresponding  protocol  with  the  server.  At  any 
point  in  time,  the  client  can  also  initiate  an  audit  protocol,  which  ensures  that  a  passing  server  must  have 
complete  knowledge  of  the  latest  version  of  the  client  data.  The  cost  of  any  read/write/audit  execution 
in  terms  of  server /client  work  and  communication  is  only  polylogarithmic  in  the  size  of  the  client  data. 
The  server’s  storage  remains  linear  in  the  size  of  the  client  data.  Therefore,  our  scheme  is  optimal  in  an 
asymptotic  sense,  up  to  polylogarithmic  factors.  See  Section  7  for  a  detailed  efficiency  analysis. 

PoR  via  Oblivious  RAM.  Our  dynamic  PoR  solution  starts  with  the  same  idea  as  the  first  proposal 
above,  where  the  client  redundantly  encodes  small  blocks  of  her  data  individually  to  form  the  value  C  = 
(ci, ..  .,cl)  G  £in,  consisting  of  L  codeword  blocks  and  £'  =  Ln  codeword  symbols,  as  defined  previously. 
The  goal  is  to  then  store  C  on  the  server  in  some  “clever  way”  so  that  that  the  server  cannot  selectively 
delete  too  many  symbols  within  any  single  codeword  block  Cj,  even  after  observing  the  client’s  read  and  write 
executions  (which  access  exactly  these  symbols).  As  highlighted  by  the  second  proposal,  simply  permuting 
the  locations  of  the  codeword  symbols  of  C  is  insufficient.  Instead,  our  main  idea  it  to  store  all  of  the 
individual  codeword  symbols  of  C  on  the  server  using  an  oblivious  RAM  scheme. 

Overview  of  ORAM.  Oblivious  RAM  (ORAM),  initially  defined  by  Goldreich  and  Ostrovsky  [13],  allows 
a  client  to  outsource  her  memory  to  a  remote  server  while  allowing  the  client  to  perform  random-access  reads 
and  writes  in  a  private  way.  More  precisely,  the  client  has  some  data  D  G  T,d,  which  she  stores  on  the  server 
in  some  carefully  designed  privacy-preserving  form,  while  only  keeping  a  short  local  state.  She  can  later  run 
efficient  protocols  with  the  server  to  read  or  write  to  the  individual  entries  of  D.  The  read/write  protocols 
of  the  ORAM  scheme  should  be  efficient,  and  the  client/server  work  and  communication  during  each  such 
protocol  should  be  small  compared  to  the  size  of  D  (e.g.,  polylogarithmic) .  A  secure  ORAM  scheme  not 
only  hides  the  content  of  D  from  the  server,  but  also  the  access  pattern  of  which  locations  in  D  the  client  is 
reading  or  writing  in  each  protocol  execution.  Thus,  the  server  cannot  discern  any  correlation  between  the 
physical  locations  of  its  storage  that  it  is  asked  to  access  during  each  read/write  protocol  execution  and  the 
logical  location  inside  D  that  the  client  wants  to  access  via  this  protocol. 

We  review  the  literature  and  efficiency  of  ORAM  schemes  in  Section  6.  In  our  work,  we  will  also  always 
use  ORAM  schemes  that  are  authenticated,  which  means  that  the  client  can  detect  if  the  server  ever  sends 
an  incorrect  value.  In  particular,  authenticated  ORAM  schemes  ensure  that  the  most  recent  version  of  the 
data  is  being  retrieved  in  any  accepting  read  execution,  preventing  the  server  from  “rolling  back”  updates. 

Construction  of  Dynamic  PoR.  A  detailed  technical  description  of  our  construction  appears  in  Sec¬ 
tion  5,  and  below  we  give  a  simplified  overview.  In  our  PoR  construction,  the  client  starts  with  data 

4The  above  only  holds  when  the  complexity  of  the  updates  and  the  audit  are  both  o(vf),  where  l  is  the  size  of  the  data.  See 
Appendix  A  for  a  simple  protocol  of  this  form  that  archives  square-root  complexity. 
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Figure  1:  Our  Construction 


M  E  which  she  splits  into  small  message  blocks  M  =  (mi, . . . ,  rrik)  with  m;  E  T,k  where  the  block 
size  k  <C  i  =  Lk  is  only  dependant  on  the  security  parameter.  She  then  applies  an  error  correcting  code 
Enc  :  — >  En  that  can  efficiently  recover  ^  erasures  to  each  message  block  individually,  resulting  in  the 

value  C  =  (ci, . . . ,  c £)  E  where  Cj  =  Enc(rrij).  Finally,  she  initializes  an  ORAM  scheme  with  the  initial 
data  D  =  C,  which  the  ORAM  stores  on  the  server  in  some  clever  privacy-preserving  form,  while  keeping 
only  a  short  local  state  at  the  client. 

Whenever  the  client  wants  to  read  or  write  to  some  location  within  her  data,  she  uses  the  ORAM  scheme 
to  perform  the  necessary  reads/ writes  on  each  of  the  n  relevant  codeword  symbols  of  C  (see  details  in 
Section  5).  To  run  an  audit,  the  client  chooses  t  («  security  parameter)  random  locations  in  {1, . . . ,  Ln} 
and  runs  the  ORAM  read  protocol  t  times  to  read  the  corresponding  symbols  of  C  that  reside  in  these 
locations,  checking  them  for  authenticity. 


Catching  Disregarded  Updates.  First,  let  us  start  with  a  sanity  check,  to  explain  how  the  above  con¬ 
struction  can  thwart  a  specific  attack  in  which  the  server  simply  disregards  the  latest  update.  In  particular, 
such  attack  should  be  caught  by  a  subsequent  audit.  During  the  audit,  the  client  runs  the  ORAM  pro¬ 
tocol  to  read  t  random  codeword  symbols  and  these  are  unlikely  to  coincide  with  any  of  the  n  codeword 
symbols  modified  by  the  latest  update  (recall  that  t  and  n  are  both  small  and  independent  of  the  data 
size  £).  However,  the  ORAM  scheme  stores  data  on  the  server  in  a  highly  organized  data-structure,  and 
ensures  that  the  most  recently  updated  data  is  accessed  during  any  subsequent  “read”  execution,  even  for 
an  unrelated  logical  location.  This  is  implied  by  ORAM  security  since  we  need  to  hide  whether  or  not  the 
location  of  a  read  was  recently  updated  or  not.  Therefore,  although  the  audit  executes  the  “ORAM  read” 
protocols  on  random  logical  locations  inside  C,  the  ORAM  scheme  will  end  up  scanning  recently  updated 
ares  of  the  server’s  actual  storage  and  check  them  for  authenticity,  ensuring  that  recent  updates  have  not 
been  disregarded. 

Security  and  “Next-Read  Pattern  Hiding”.  The  high-level  security  intuition  for  our  PoR  scheme  is 
quite  simple.  The  ORAM  hides  from  the  server  where  the  various  locations  of  C  reside  in  its  storage,  even 
after  observing  the  access  pattern  of  read/write  executions.  Therefore  it  is  difficult  for  the  server  to  reach  a 
state  where  it  will  fail  on  read  executions  for  most  locations  within  some  single  codeword  block  (lose  data) 
without  also  failing  on  too  many  read  executions  altogether  (lose  the  ability  to  pass  an  audit). 

Making  the  above  intuition  formal  is  quite  subtle,  and  it  turns  out  that  standard  notion  of  ORAM 
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security  does  not  suffice.  The  main  issue  is  that  that  the  server  may  be  able  to  somehow  delete  all  (or 
most)  of  the  n  codeword  symbols  that  fall  within  some  codeword  block  c*  =  (C[j  + 1], . . . ,  C[j  +  n])  without 
knowing  which  block  it  deleted.  Therefore,  although  the  server  will  fail  on  any  subsequent  read  if  and  only 
if  its  location  falls  within  the  range  {j  +  1 , . . . ,  j  +  n} ,  it  will  not  learn  anything  about  the  location  of  the 
read  itself  since  it  does  not  know  the  index  j.  Indeed,  we  will  give  an  example  of  a  contrived  ORAM  scheme 
where  such  an  attack  is  possible  and  our  resulting  construction  of  PoR  using  this  ORAM  is  insecure. 

We  show,  however,  that  the  intuitive  reasoning  above  can  be  salvaged  if  the  ORAM  scheme  achieves  a 
new  notion  of  security  that  we  call  next-read  pattern  hiding  (NRPH),  which  may  be  of  independent  interest. 
NRPH  security  considers  an  adversarial  server  that  first  gets  to  observe  many  read/write  protocol  executions 
performed  sequentially  with  the  client,  resulting  in  some  final  client  configuration  Cfm.  The  adversarial  server 
then  gets  to  see  various  possibilities  for  how  the  “next  read'"  operation  would  be  executed  by  the  client  for 
various  distinct  locations,  where  each  such  execution  starts  from  the  same  fixed  client  configuration  Cfjn.5 
The  server  should  not  be  able  to  discern  any  relationship  between  these  executions  and  the  locations  they 
are  reading.  For  example,  two  such  “next-read”  executions  where  the  client  reads  two  consecutive  locations 
should  be  indistinguishable  from  two  executions  that  read  two  random  and  unrelated  locations.  This  notion 
of  NRPH  security  will  be  used  to  show  that  server  cannot  reach  a  state  where  it  can  selectively  fail  to  respond 
on  read  queries  whose  location  falls  within  some  small  range  of  a  single  codeword  block  (lose  data),  but  still 
respond  correctly  to  most  completely  random  reads  (pass  an  audit). 

Proving  Security  via  an  Extractor.  As  mentioned  earlier,  the  security  of  PoR  is  formalized  via  an 
extractor  and  we  now  give  a  high-level  overview  of  how  such  an  extractor  works.  In  particular,  we  claim 
that  we  can  take  any  adversarial  server  that  has  a  “good”  chance  of  passing  an  audit  and  use  the  extractor 
to  efficiently  recover  the  latest  version  of  the  client  data  from  it.  The  extractor  initializes  an  “empty  array” 
C.  It  then  executes  random  audit  protocols  with  the  server,  by  acting  as  the  honest  client.  In  particular, 
it  chooses  t  random  locations  within  the  array  and  runs  the  corresponding  ORAM  read  protocols.  If  the 
execution  of  the  audit  is  successful,  the  extractor  fills  in  the  corresponding  values  of  C  that  it  learned  during 
the  audit  execution.  In  either  case,  it  then  rewinds  the  server  and  runs  a  fresh  execution  of  the  audit, 
repeating  this  step  for  several  iterations. 

Since  the  server  has  a  good  chance  of  passing  a  random  audit,  it  is  easy  to  show  that  the  extractor  can 
eventually  recover  a  large  fraction,  say  >  |,  of  the  entries  inside  C  by  repeating  this  process  sufficiently  many 
times.  Because  of  the  authenticity  of  the  ORAM,  the  recovered  values  are  the  correct  ones,  corresponding  to 
the  latest  version  of  the  client  data.  Now  we  need  to  argue  that  there  is  no  codeword  block  c;  within  C  for 
which  the  extractor  recovered  fewer  than  ^  of  its  codeword  symbols,  as  this  would  prevent  us  from  applying 
erasure  decoding  and  recovering  the  underlying  message  block.  Let  FAILURE  denote  the  above  bad  event. 
If  all  the  recovered  locations  (comprising  >  |  fraction  of  the  total)  were  distributed  uniformly  within  C 
then  FAILURE  would  occur  with  negligible  probability,  as  long  as  the  codeword  size  n  is  sufficiently  large 
in  the  security  parameter.  We  can  now  rely  on  the  NRPH  security  of  the  ORAM  to  ensure  that  FAILURE 
also  happens  with  negligible  probability  in  our  case.  We  can  think  of  the  FAILURE  event  as  a  function  of 
the  locations  queried  by  the  extractor  in  each  audit  execution,  and  the  set  of  executions  on  which  the  server 
fails.  If  the  malicious  server  can  cause  FAILURE  to  occur,  it  means  that  it  can  distinguish  the  pattern 
of  locations  actually  queried  by  the  extractor  during  the  audit  executions  (for  which  the  FAILURE  event 
occurs)  from  a  randomly  permuted  pattern  of  locations  (for  which  the  FAILURE  event  does  not  occur  with 
overwhelming  probability).  Note  that  the  use  of  rewinding  between  the  audit  executions  of  the  extractor 
forces  us  to  rely  on  NRPH  security  rather  than  just  standard  ORAM  security. 

The  above  presents  the  high-level  intuition  and  is  somewhat  oversimplified.  See  Section  4  for  the  formal 
definition  of  NRPH  security  and  Section  5  for  the  formal  description  of  our  dynamic  PoR  scheme  and  a 
rigorous  proof  of  security. 

sThis  is  in  contrast  to  the  standard  sequential  operations  where  the  client  state  is  updated  after  each  execution. 
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Achieving  Next-Read  Pattern  Hiding.  We  show  that  standard  ORAM  security  does  not  generically 
imply  NRPH  security,  by  giving  a  contrived  scheme  that  satisfies  the  former  but  not  the  latter.  Nevertheless, 
many  natural  ORAM  constructions  in  the  literature  do  seem  to  satisfy  NRPH  security.  In  particular,  we 
examine  the  efficient  ORAM  construction  of  Goodrich  and  Mitzenmacher  [15]  and  prove  that  (with  minor 
modifications)  it  is  NRPH  secure. 

Contributions.  We  call  our  final  scheme  PORAM  since  it  combines  the  techniques  and  security  of  PoR  and 
ORAM.  In  particular,  other  than  providing  provable  dynamic  cloud  storage  as  was  our  main  goal,  our  scheme 
also  satisfies  the  strong  privacy  guarantees  of  ORAM,  meaning  that  it  hides  all  contents  of  the  remotely 
stored  data  as  well  as  the  access  pattern  of  which  locations  are  accessed  when.  It  also  provides  strong 
authenticity  guarantees  (same  as  memory  checking ;  see  Section  1.2),  ensuring  that  any  “read”  execution 
with  a  malicious  remote  server  is  guaranteed  to  return  the  latest  version  of  the  data  (or  detect  cheating). 

In  brief,  our  contributions  can  be  summarized  as  follows: 

•  We  give  the  first  asymptotically  efficient  solution  to  PoR  for  outsourced  dynamic  data,  where  a  suc¬ 
cessful  audit  ensures  that  the  server  knows  the  latest  version  of  the  client  data.  In  particular: 

—  Client  storage  is  small  and  independent  of  the  data  size. 

—  Server  storage  is  linear  in  the  data  size,  expanding  it  by  only  a  small  constant  factor. 

—  Communication  and  computation  of  client  and  server  during  read ,  write ,  and  audit  executions 
are  polylogarithmic  in  the  size  of  the  client  data. 

•  Our  scheme  also  achieves  strong  privacy  and  authenticity  guarantees,  matching  those  of  oblivious  RAM 
and  memory  checking. 

•  We  present  a  new  security  notion  called  “next-read  pattern  hiding  (NRPH)”  for  ORAM  and  a  con¬ 
struction  achieving  this  new  notion,  which  may  be  of  independent  interest. 

We  mention  that  the  PORAM  scheme  is  simple  to  implement  and  has  low  concrete  efficiency  overhead  on 
top  of  an  underlying  ORAM  scheme  with  NRPH  security.  There  is  much  recent  and  ongoing  research  activity 
in  instantiating/implementing  truly  practical  ORAM  schemes,  which  are  likely  to  yield  correspondingly 
practical  instantiations  of  our  PORAM  protocol. 

1.2  Related  Work 

Proofs  of  retrievability  for  static  data  were  initially  defined  and  constructed  by  Juels  and  Kaliski  [19], 
building  on  a  closely  related  notion  called  sublinear-authenticators  of  Naor  and  Rothblum  [23].  Concurrently, 
Ateniese  et  al.  [1]  defined  another  related  primitive  called  provable  data  possession  (PDP).  Since  then,  there 
has  been  much  ongoing  research  activity  on  PoR  and  PDP  schemes. 

PoR  vs.  PDP.  The  main  difference  between  PoR  and  PDP  is  the  notion  of  security  that  they  achieve. 
A  PoR  audit  guarantees  that  the  server  maintains  knowledge  of  all  of  the  client  data,  while  a  PDP  audit 
only  ensures  that  the  server  is  storing  most  of  the  client  data.  For  example,  in  a  PDP  scheme,  the  server 
may  lose  a  small  portion  of  client  data  (say  1  MB  out  of  a  10  GB  file)  and  may  maintain  an  high  chance  of 
passing  a  future  audit.6  On  a  technical  level,  the  main  difference  in  most  prior  PDP/PoR  constructions  is 
that  PoR  schemes  store  a  redundant  encoding  of  the  client  data  on  the  server.  For  a  detailed  comparison, 
see  Kiipgii  [21,  22], 

6  An  alternative  way  to  use  PDPs  can  also  achieve  full  security,  at  the  cost  of  requiring  the  server  to  read  the  entire  client  data 
during  an  audit,  but  still  minimizing  the  communication  complexity.  If  the  data  is  large,  say  10  GB,  this  is  vastly  impractical. 
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Static  Data.  PoR  and  PDP  schemes  for  static  data  (without  updates)  have  received  much  research  atten¬ 
tion  [27,  10,  7,  2],  with  works  improving  on  communication  efficiency  and  exact  security,  yielding  essentially 
optimal  solutions.  Another  interesting  direction  has  been  to  extend  these  works  to  the  multi-server  setting 
[6,  8,  9]  where  the  client  can  use  the  audit  mechanism  to  identify  faulty  machines  and  recover  the  data  from 
the  others. 

Dynamic  Data.  The  works  of  Ateniese  et  al.  [3],  Erway  et  al.  [12]  and  Wang  et  al.  [30]  show  how 
to  achieve  PDP  security  for  dynamic  data ,  supporting  efficient  updates.  This  is  closely  related  to  work  on 
memory  checking  [5,  23,  11],  which  studies  how  to  authenticate  remotely  stored  dynamic  data  so  as  to  allow 
efficient  reads/writes,  while  being  able  to  verify  the  authenticity  of  the  latest  version  of  the  data  (preventing 
the  server  from  “rolling  back”  updates  and  using  an  old  version).  Unfortunately,  these  techniques  alone 
cannot  be  used  to  achieve  the  stronger  notion  of  PoR  security.  Indeed,  the  main  difficulty  that  we  resolve 
in  this  work,  how  to  efficiently  update  redundantly  encoded  data ,  does  not  come  up  in  the  context  of  PDP. 

A  recent  work  of  Stefanov  et  ah  [29]  considers  PoR  for  dynamic  data,  but  in  a  more  complex  setting 
where  an  additional  trusted  “portal”  performs  some  operations  on  behalf  of  the  client,  and  can  cache  updates 
for  an  extended  period  of  time.  It  is  not  clear  if  these  techniques  can  be  translated  to  the  basic  client/server 
setting,  which  we  consider  here.  However,  even  in  this  modified  setting,  the  complexity  of  the  updates  and 
the  audit  in  that  work  is  proportional  to  square-root  of  the  data  size,  whereas  ours  is  polylog arithmic. 

2  Preliminaries 

Notation.  Throughout,  we  use  A  to  denote  the  security  parameter.  We  identify  efficient  algorithms  as 
those  running  in  (probabilistic)  polynomial  time  in  A  and  their  input  lengths,  and  identify  negligible  quantities 
(e.g.,  acceptable  error  probabilities)  as  negl(A)  =  l/A^1',  meaning  that  they  are  asymptotically  smaller  than 
1/AC  for  every  constant  c  >  0.  For  n  £  N,  we  define  the  set  [n]  =  {1, . . . ,  n}.  We  use  the  notation  ( k  mod  n) 
to  denote  the  unique  integer  i  G  {0, . . . ,  n  —  1}  such  that  i  =  k  (mod  n). 

Erasure  Codes.  We  say  that  (Enc,  Dec)  is  an  (n,k,d)^-code  with  efficient  erasure  decoding  over  an  al¬ 
phabet  E  if  the  original  message  can  always  be  recovered  from  a  corrupted  codeword  with  at  most  d  —  1 
erasures.  That  is,  for  every  message  m  =  (mi, . . . ,  my-)  G  Efc  giving  a  codeword  c  =  (ci, . . . ,  cn )  =  Enc(m), 
and  every  corrupted  codeword  c  =  (ci,...,cn)  such  that  bL  G  {cj,_L}  and  the  number  of  erasures  is 
|{i  G  [n]  :  ct  =  _L}|  <  d  —  1,  we  have  Dec(c)  =  m.  We  say  that  a  code  is  systematic  if,  for  every 

message  m,  the  codeword  c  =  Enc(m)  contains  m  in  the  first  k  positions  ci  =  mi, . . . ,  .  A  sys¬ 

tematic  variant  of  the  Reed-Solomon  code  achieves  the  above  for  any  integers  n  >  k  and  any  field  E  of  size 
|  E  |  >  n  with  d  =  n  —  k  +  1. 

Virtual  Memory.  We  think  of  virtual  memory  M,  with  word-size  w  and  length  £,  as  an  array  M  6  Ef 
where  E  =f  {0, 1}“’.  We  assume  that,  initially,  each  location  M[i]  contains  the  special  uninitialized  symbol 
0  =  0W.  Throughout,  we  will  think  of  i  as  some  large  polynomial  in  the  security  parameter,  which  upper 
bounds  the  amount  of  memory  that  can  be  used. 

Outsourcing  Virtual  Memory.  In  the  next  two  sections,  we  look  at  two  primitives:  dynamic  PoR  and 
ORAM.  These  primitives  allow  a  client  to  outsource  some  virtual  memory  M  to  a  remote  server,  while 
providing  useful  security  guarantees.  Reading  and  writing  to  some  location  of  M  now  takes  on  the  form 
of  a  protocol  execution  with  the  server.  The  goal  is  to  provide  security  while  preserving  efficiency  in  terms 
of  client/server  computation,  communication,  and  the  number  of  server-memory  accesses  per  operation, 
which  should  all  be  poly-log  arithmic  in  the  length  l.  We  also  want  to  optimize  the  size  of  the  client  storage 
(independent  of  £)  and  server  storage  (not  much  larger  than  £).  We  find  this  abstract  view  of  outsourcing 
memory  to  be  the  simplest  and  most  general  to  work  with.  Any  higher-level  data-structures  and  operations 


7 

9.  Dynamic  Proofs  of  Retrievability  via  Oblivious  RAM 


(e.g.,  allowing  appends/inserts  to  data  or  implementing  an  entire  file-system)  can  be  easily  done  on  top  of 
this  abstract  notion  of  memory  and  therefore  securely  outsourced  to  the  remote  server. 

3  Dynamic  PoR 

A  Dynamic  PoR  scheme  consists  of  protocols  Plnit,  PRead,  PWrite,  Audit  between  two  stateful  parties:  a 
client  C  and  a  server  S.  The  server  acts  as  the  curator  for  some  virtual  memory  M,  which  the  client  can 
read,  write  and  audit  by  initiating  the  corresponding  interactive  protocols: 

•  Plnit(lA,  lw  ,1):  This  protocol  corresponds  to  the  client  initializing  an  (empty)  virtual  memory  M  with 
word-size  w  and  length  t,  which  it  supplies  as  inputs. 

•  PRead(i):  This  protocol  corresponds  to  the  client  reading  v  =  M[z],  where  it  supplies  the  input  i  and 
outputs  some  value  v  at  the  end. 

•  PWrite(i,u):  This  protocol  corresponds  to  setting  M[i]  :=  v,  where  the  client  supplies  the  inputs  i,v. 

•  Audit:  This  protocol  is  used  by  the  client  to  verify  that  the  server  is  maintaining  the  memory  contents 
correctly  so  that  they  remain  retrievable.  The  client  outputs  a  decision  b  E  {accept,  reject}. 

The  client  C  in  the  protocols  may  be  randomized ,  but  we  assume  (w.l.o.g.)  that  the  honest  server  S  is 
deterministic.  At  the  conclusion  of  the  Plnit  protocol,  both  the  client  and  the  server  create  some  long-term 
local  state,  which  each  party  will  update  during  the  execution  of  each  of  the  subsequent  protocols.  The 
client  may  also  output  reject  during  the  execution  of  the  Plnit,  PRead,  PWrite  protocols,  to  denote  that  it 
detected  some  misbehavior  of  the  server.  Note  that  we  assume  that  the  virtual  memory  is  initially  empty, 
but  if  the  client  has  some  initial  data,  she  can  write  it  onto  the  server  block-by-block  immediately  after 
initialization.  For  ease  of  presentation,  we  may  assume  that  the  state  of  the  client  and  the  server  always 
contains  the  security  parameter,  and  the  memory  parameters  (1A,  lw,£). 

We  now  define  the  three  properties  of  a  dynamic  PoR  scheme:  correctness,  authenticity  and  retriev ability. 
For  these  definitions,  we  say  that  P  =  (opo,  op\, . . . ,  opq)  is  a  dynamic  PoR  protocol  sequence  if  opo  = 
Plnit(lA,  lw,  l)  and,  for  j  >  0,  opj  E  {PRead(i),  PWrite(i,u),  Audit}  for  some  index  i  G  [£]  and  value 
uG{o,ir. 

Correctness.  If  the  client  and  the  server  are  both  honest  and  P  =  ( opo , . . . ,  opq)  is  some  protocol  sequence, 
then  we  require  the  following  to  occur  with  probability  1  over  the  randomness  of  the  client: 

•  Each  execution  of  a  protocol  opj  =  PRead(i)  results  in  the  client  outputting  the  correct  value  v  =  M[f], 
matching  what  would  happen  if  the  corresponding  operations  were  performed  directly  on  a  memory  M. 
In  particular,  v  is  the  value  contained  in  the  most  recent  prior  write  operation  with  location  i,  or,  if  no 
such  prior  operation  exists,  v  =  0. 

•  Each  execution  of  the  Audit  protocol  results  in  the  decision  b  =  accept. 

Authenticity.  We  require  that  the  client  can  always  detect  if  any  protocol  message  sent  by  the  server  de¬ 
viates  from  honest  behavior.  More  precisely,  consider  the  following  game  AuthGame^(A)  between  a  malicious 
server  S  and  a  challenger: 

•  The  malicious  server  5(1A)  specifies  a  valid  protocol  sequence  P  =  ( opo ,  ■  ■  ■ ,  opq). 

•  The  challenger  initializes  a  copy  of  the  honest  client  C  and  the  (deterministic)  honest  server  S.  It 
sequentially  executes  opo, . . . ,  opq  between  C  and  the  malicious  server  S  while,  in  parallel,  also  passing 
a  copy  of  every  message  from  C  to  the  honest  server  S. 

•  If,  at  any  point  during  the  execution  of  some  opj,  any  protocol  message  given  by  S  differs  from  that  of 
S,  and  the  client  C  does  not  output  reject,  the  adversary  wins  and  the  game  outputs  1.  Else  0. 
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For  any  efficient  adversarial  server  S,  we  require  Pr[AuthGame^(A)  =  1]  <  negl(A).  Note  that  authenticity 
and  correctness  together  imply  that  the  client  will  always  either  read  the  correct  value  corresponding  to  the 
latest  contents  of  the  virtual  memory  or  reject  whenever  interacting  with  a  malicious  server. 

Retrievability.  Finally  we  define  the  main  purpose  of  a  dynamic  PoR  scheme,  which  is  to  ensure  that  the 
client  data  remains  retrievable.  We  wish  to  guarantee  that,  whenever  the  malicious  server  is  in  a  state  with 
a  reasonable  probability  5  of  successfully  passing  an  audit,  he  must  know  the  entire  content  of  the  client’s 
virtual  memory  M.  As  in  “proofs  of  knowledge”,  we  formalize  knowledge  via  the  existence  of  an  efficient 
extractor  £  which  can  recover  the  value  M  given  (black-box)  access  to  the  malicious  server. 

More  precisely,  we  define  the  game  ExtGame^  £{\,p)  between  a  malicious  server  S,  extractor  £,  and 
challenger: 

•  The  malicious  server  N(1A)  specifies  a  protocol  sequence  P  =  ( opo , . . . ,  opq).  Let  M  £  be  the  correct 
value  of  the  memory  contents  at  the  end  of  executing  P. 

•  The  challenger  initializes  a  copy  of  the  honest  client  C  and  sequentially  executes  opo, . . . ,  opq  between  C 
and  S.  Let  Cfm  and  5fjn  be  the  final  configurations  (states)  of  the  client  and  malicious  server  at  the  end  of 
this  interaction,  including  all  of  the  random  coins  of  the  malicious  server.  Define  the  success-probability 

p  Audit  n 

Ofin  < - >  Cfin  =  accept 

as  the  probability  that  an  execution  of  a  subsequent  Audit  protocol  between  «Sfjn  and  Cfm  results  in  the 
latter  outputting  accept.  The  probability  is  only  over  the  random  coins  of  Cfm  during  this  execution. 

•  Run  M7  £s,in(Cfm,  1  ,  lp),  where  the  extractor  £  gets  black-box  rewinding  access  to  the  malicious 
server  in  its  final  configuration  <Sfjn,  and  attempts  to  extract  out  the  memory  contents  as  MV 

•  If  Succ(6fjn)  >  1/p  and  M7  /  M  then  output  1,  else  0. 

We  require  that  there  exists  a  probabilistic-poly-time  extractor  £  such  that,  for  every  efficient  malicious 
server  S  and  every  polynomial  p  =  p( A)  we  have  PrfExtGame^  £(X,p)  =  1]  <  negl(A). 

The  above  says  that  whenever  the  malicious  server  reaches  some  state  5fjn  in  which  it  maintains  a  5  >  1/p 
probability  of  passing  the  next  audit ,  the  extractor  £  will  be  able  to  extract  out  the  correct  memory  contents 
M  from  tSfjn ,  meaning  that  the  server  must  retain  full  knowledge  of  M  in  this  state.  The  extractor  is  efficient, 
but  can  run  in  time  polynomial  in  p  and  the  size  of  the  memory  l. 

A  Note  on  Adaptivity.  We  defined  the  above  authenticity  and  retrievability  properties  assuming  that 
the  sequence  of  read/write  operations  is  adversarial,  but  is  chosen  non- adaptively,  before  the  adversarial 
server  sees  any  protocol  executions.  This  seems  to  be  sufficient  in  most  realistic  scenarios,  where  the  server 
is  unlikely  to  have  any  influence  on  which  operations  the  client  wants  to  perform.  It  also  matches  the 
security  notions  in  prior  works  on  ORAM.  Nevertheless,  we  note  that  our  final  results  also  achieve  adaptive 
security,  where  the  attacker  can  choose  the  sequence  of  operations  opi  adaptively  after  seeing  the  execution 
of  previous  operations,  if  the  underlying  ORAM  satisfies  this  notion.  Indeed,  most  prior  ORAM  solutions 
seem  to  do  so,  but  it  was  never  included  in  their  analysis. 

4  Oblivious  RAM  with  Next-Read  Pattern  Hiding 

An  ORAM  consists  of  protocols  (Olnit,  ORead.  OWrite)  between  a  client  C  and  a  server  S,  with  the  same 
syntax  as  the  corresponding  protocols  in  PoR.  We  will  also  extend  the  syntax  of  ORead  and  OWrite  to  allow 
for  reading/writing  from/to  multiple  distinct  locations  simultaneously.  That  is,  for  arbitrary  f  £  N,  we  define 

7This  is  similar  to  the  extractor  in  zero-knowledge  proofs  of  knowledge.  In  particular  £  can  execute  protocols  with  the 
malicious  server  in  its  state  <Sfi„  and  rewind  it  back  this  state  at  the  end  of  the  execution. 


Succ(5fin)  =  Pr 
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the  protocol  ORead(ii,  ■  ■  ■  ,it)  for  distinct  indices  *i,  G  [£],  in  which  the  client  outputs  (tq, . . . ,  vt)  corre¬ 
sponding  to  reading  v\  =  . . . ,  vt  =  M[it].  Similarly,  we  define  the  protocol  OWrite(*t, . . .  ,it',vi, . . .  ,vt) 

for  distinct  indices  i\, . . .  ,it  G  [l],  which  corresponds  to  setting  M[*i]  :=  Vi, . . . , M[i*]  :=  vt- 

We  say  that  P  =  (opo, . . . ,  opq)  is  an  ORAM  protocol  sequence  if  opo  =  Olnit(lA,  lw ,£)  and,  for  j  >  0, 
opj  is  a  valid  (multi- location)  read/write  operation. 

We  require  that  an  ORAM  construction  needs  to  satisfy  correctness  and  authenticity ,  which  are  defined 
the  same  way  as  in  PoR.8  For  privacy,  we  define  a  new  property  called  next-read  pattern  hiding.  For 
completeness,  we  also  define  the  standard  notion  of  ORAM  pattern  hiding  in  Appendix  B. 

Next-Read  Pattern  Hiding.  Consider  an  honest-but- curious  server  A  who  observes  the  execution  of 
some  protocol  sequence  P  with  a  client  C  resulting  in  the  final  client  configuration  Cfjn.  At  the  end  of  this 
execution,  A  gets  to  observe  how  Cfin  would  execute  the  next  read  operation  ORead(ii, . . .  ,it)  for  various 
different  f-tuples  (ii, . . . ,  it)  of  locations,  but  always  starting  in  the  same  client  state  Cfm.  We  require  that  A 
cannot  observe  any  correlation  between  these  next-read  executions  and  their  locations,  up  to  equality.  That 
is,  A  should  not  be  able  to  distinguish  if  Cf;n  instead  executes  the  next-read  operations  on  permuted  locations 
ORead(7r(ii), . . .  ,vr(?'t))  for  a  permutation  n  :  [I]  — >  \i\. 

More  formally,  we  define  NextReadGame 4(A),  for  b  G  {0, 1},  between  an  adversary  A  and  a  challenger: 

•  The  attacker  M(1A)  chooses  an  ORAM  protocol  sequence  P\  =  (opo, . . .  ,opqi).  It  also  chooses  a 
sequence  P2  =  (rop\ ,  ...,ropq2)  of  valid  multi- location  read  operations,  where  each  operation  is  of 
the  form  ropj  =  ORead(ijii, . . . ,  with  tj  distinct  locations.  Lastly,  it  chooses  a  permutation 
ir  :  \$\  — »•  [£].  For  each  ropj  in  P2 ,  define  a  permuted  version  ropb  :=  ORead(7r(ijii), . . . ,  T^(ij,tj))-  The 
game  now  proceeds  in  two  stages. 

•  Stage  I.  The  challenger  initializes  the  honest  client  C  and  the  (deterministic)  honest  server  S.  It 
sequentially  executes  the  protocols  P  =  (opo, . . .  ,opqi)  between  C  and  S.  Let  Cfm, 5fm  be  the  final 
configuration  of  the  client  and  server  at  the  end. 

•  Stage  II.  For  each  j  G  [92] :  challenger  either  executes  the  original  operation  ropj  if  b  =  0,  or  the 
permuted  operation  ropt  if  b  =  1,  between  C  and  S.  At  the  end  of  each  operation  execution  it  resets 
the  configuration  of  the  client  and  server  back  to  Cfjn,<Sfjn  respectively,  before  the  next  execution. 

•  The  adversary  A  is  given  the  transcript  of  all  the  protocol  executions  in  stages  I  and  II,  and  outputs  a 
bit  b  which  we  define  as  the  output  of  the  game.  Note  that,  since  the  honest  server  S  is  deterministic, 
seeing  the  protocol  transcripts  between  S  and  C  is  the  same  as  seeing  the  entire  internal  state  of  S  at 
any  point  time. 

We  require  that,  for  every  efficient  A,  we  have 

|Pr[NextReadGame^(A)  =  1]  —  Pr[NextReadGame^(A)  =  1] |  <  negl(A). 


5  PORAM:  Dynamic  PoR  via  ORAM 

We  now  give  our  construction  of  dynamic  PoR,  using  ORAM.  Since  the  ORAM  security  properties  are 
preserved  by  the  construction  as  well,  we  happen  to  achieve  ORAM  and  dynamic  PoR  simultaneously. 
Therefore,  we  call  our  construction  PORAM. 

traditionally,  authenticity  is  not  always  defined/required  for  ORAM.  However,  it  is  crucial  for  our  use.  As  noted  in  several 
prior  works,  it  can  often  be  added  at  almost  no  cost  to  efficiency.  It  can  also  be  added  generically  by  running  a  memory  checking 
scheme  on  top  of  ORAM.  See  Section  6.4  for  details. 


10 

9.  Dynamic  Proofs  of  Retrievability  via  Oblivious  RAM 


Overview  of  Construction.  Let  (Enc,  Dec)  be  an  (n,k,d  =  n  —  k  +  l)s  systematic  code  with  efficient 
erasure  decoding  over  the  alphabet  £  =  {0,1  }w  (e.g.,  the  systematic  Reed-Solomon  code  over  F2™).  Our 
construction  of  dynamic  PoR  will  interpret  the  memory  M  6  £^  as  consisting  of  L  =  i/k  consecutive  message 
blocks,  each  having  k  alphabet  symbols  (assume  k  is  small  and  divides  £).  The  construction  implicitly  maps 
operation  on  M  to  operations  on  encoded  memory  C  £  (£)^code=Ln,  which  consists  of  L  codeword  blocks 
with  n  alphabet  symbols  each.  The  L  codeword  blocks  C  =  (ci, . . . ,  c l)  are  simply  the  encoded  versions  of 
the  corresponding  message  blocks  in  M  =  (mi, . . .  ,  m^)  with  cq  =  Enc(m9)  for  q  £  \L\.  This  means  that, 

for  each  i  £  [£],  the  value  of  the  memory  location  M[i]  can  only  affect  the  values  of  the  encoded- memory 

locations  C[j  +  1], . . . ,  C[j  +  n]  where  j  =  n  ■  [i/k\.  Furthermore,  since  the  encoding  is  systematic ,  we  have 
M[i]  =  C\j  +  u]  where  u  =  (i  mod  k)  +  1.  To  read  the  memory  location  M[i],  the  client  will  use  ORAM 
to  read  the  codeword  location  C [j  +  u].  To  write  to  the  memory  location  M[i]  :=  v,  the  client  needs  to 
update  the  entire  corresponding  codeword  block.  She  does  so  by  first  using  ORAM  to  read  the  corresponding 
codeword  block  c  =  (C\j  +  1], . . . ,  C\j  +  n]),  and  decodes  to  obtain  the  original  memory  block  m  =  Dec(c).9 
She  then  locally  updates  the  memory  block  by  setting  m[u]  :=  v,  re-encodes  the  updated  memory  block 
to  get  c'  =  (ci, ,  cn)  :=  Enc(m)  and  uses  the  ORAM  to  write  c'  back  into  the  encoded  memory,  setting 
C[j  +  1]  C\j  +  ?t]  :=  c'n. 

The  Construction.  Our  PORAM  construction  is  defined  for  some  parameters  n  >  k,t  £  N.  Let  O  = 
(Olnit,  ORead,  OWrite)  be  an  ORAM.  Let  (Enc,  Dec)  be  an  ( n,k,d  =  n  —  k  +  l)s  systematic  code  with 
efficient  erasure  decoding  over  the  alphabet  £  =  {0, 1}“  (e.g.,  the  systematic  Reed-Solomon  code  over  F2™). 

•  Plnit(l\  lw,£)  :  Assume  k  divides  £  and  let  fcode  :=  n  ■  ( t/k ).  Run  the  Olnit(lA,  l™,^code)  protocol. 

•  PRead(i):  Let  i'  :=  n  ■  |_ i/k\  +  ( i  mod  k)  +  1  and  run  the  ORead(-i')  protocol. 

•  PWrite(i,u):  Set  j  :=  n  ■  |_ i/k\  and  u  :=  ( i  mod  k )  +  1. 

—  Run  ORead(j  +  1, . . . ,  j  +  n)  and  get  output  c  =  (ci, . . . ,  cn ). 

—  Decode  m  =  (mi, . . . ,  m^)  =  Dec(c). 

—  Modify  position  u  of  m  by  locally  setting  mu  :=  v.  Re-encode  the  modified  message-block  m  by 
setting  c'  =  (c), . . . ,  c'n)  :=  Enc(m). 

—  Run  OWrite(j  +  1, . . .  ,j  +  n;  c{, . . . ,  c'n). 

•  Audit:  Pick  t  distinct  indices  j\ , . . .  ,jt  £  [^Code]  at  random.  Run  ORead(ji, . . . ,  jt)  and  return  accept 
iff  the  protocol  finished  without  outputting  reject. 

If,  any  ORAM  protocol  execution  in  the  above  scheme  outputs  reject,  the  client  enters  a  special  rejection 
state  in  which  it  stops  responding  and  automatically  outputs  reject  for  any  subsequent  protocol  execution. 

It  is  easy  to  see  that  if  the  underlying  ORAM  scheme  used  in  the  above  PORAM  construction  is  secure  in 
the  standard  sense  of  ORAM  (see  Appendix  B)  then  the  above  construction  preserves  this  ORAM  security, 
hiding  which  locations  are  being  accessed  in  each  operation.  As  our  main  result,  we  now  prove  that  if  the 
ORAM  scheme  satisfies  next-read  pattern  hiding  (NRPH)  security  then  the  PORAM  construction  above  is 
also  a  secure  dynamic  PoR  scheme. 

Theorem  1.  Assume  that  O  =  (Olnit,  ORead,  OWrite)  is  an  ORAM  with  next-read  pattern  hiding  (NRPH) 
security,  and  we  choose  parameters  k  =  D(A),  k/n  =  (1  —  D(l)),  t  =  0(A).  Then  the  above  scheme 
PORAM  =  (Plnit,  PRead,  PWrite,  Audit)  is  a  dynamic  PoR  scheme. 


9We  can  skip  this  step  if  the  client  already  has  the  value  m  stored  locally  e.g.  from  prior  read  executions. 
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5.1  Proof  of  Theorem  1 


The  correctness  and  authenticity  properties  of  PORAM  follow  immediately  from  those  of  the  underlying 
ORAM  scheme  O.  The  main  challenge  is  to  show  that  the  retriev ability  property  holds.  As  a  first  step,  let 
us  describe  the  extractor. 

The  Extractor.  The  extractor  £5fin(Cfjn,  lf,  lp)  works  as  follows: 

(1)  Initialize  C  :=  (_|_)£code  where  f?Code  =  n(£/k)  to  be  an  empty  vector. 

(2)  Keep  rewinding  and  auditing  the  server  by  repeating  the  following  step  for  s  =  max(2tcocje,  A)  ■  p  times: 

Pick  t  distinct  indices  ji,  -  ■ .  ,jt  £  [^code]  at  random  and  run  the  protocol  ORead(ji, . . .  ,jt )  with  <Sfjn, 
acting  as  Cfm  as  in  the  audit  protocol.  If  the  protocol  is  accepting  and  Cfm  outputs  (iq set 
C[ji]  :=  vi, . . . ,  C [jt]  :=  vt .  Rewind  to  their  state  prior  to  this  execution  for  the  next  iteration. 

(3)  Let  <5  =f  (1  +  ^)/2.  If  the  number  of  “filled  in”  values  in  C  is  \{  j  E  [£Code]  :  C[j]  /  _L}|  <  5  ■  £code  then 
output  f a i 1 1 .  Else  interpret  C  as  consisting  of  L  =  £/k  consecutive  codeword  blocks  C  =  (ci, . . .  ,  c^) 
with  each  block  c j  E  En.  If  there  exists  some  index  j  E  [L\  such  that  the  number  of  “filled”  in  values  in 
codeword  block  c j  is  |{i  G  [n]  :  Cj[i]  /  _L}|  <  k  then  output  f a i 1 2 -  Otherwise,  apply  erasure  decoding 
to  each  codeword  block  Cj,  to  recover  rrij  =  Dec(cj),  and  output  M  =  (mi, . . . ,  m/-J  G  E^.10 

Proof  by  Contradiction.  Assume  that  PORAM  does  not  satisfy  the  retrievability  property  with  the  above 
extractor  8.  Then  there  exists  some  efficient  adversarial  server  S  and  some  polynomials  p  =  p(X),p'  =  p'( A) 
such  that,  for  infinitely  many  values  A  G  N,  we  have: 

Pr[ExtGamei5  £.(A,p(A))  =  1]  >  (1) 

Using  the  same  notation  as  in  the  definition  of  ExtGame,  let  5fjn,Cfjn  be  the  final  configurations  of  the 
malicious  server  S  and  client  C,  respectively,  after  executing  the  protocol  sequence  P  chosen  by  the  server  at 
the  beginning  of  the  game,  and  let  M  be  the  correct  value  of  the  memory  contents  resulting  from  P.  Then 
(1)  implies 

Pr  [  Succ(5fjn )  >  ^ 

[  A£i‘Sfin(Cfin,  1£,  lp)  /  M 

where  the  probability  is  over  the  coins  of  C,S  which  determine  the  final  configuration  5fjn,Cfin  and  the  coins 
of  the  extractor  8.  We  now  slowly  refine  the  above  inequality  until  we  reach  a  contradiction,  showing  that 
the  above  cannot  hold. 

Extractor  can  only  fail  with  { f a i 1 1 ,  f a i 1 2 } .  Firstly,  we  argue  that  at  the  conclusion  of  ExtGame,  the 
extractor  must  either  output  the  correct  memory  contents  M  or  must  fail  with  one  of  the  error  messages 
{faili,  f a i 1 2 } -  In  other  words,  it  can  always  detect  failure  and  never  outputs  an  incorrect  value  M7  7^  M.  This 
follows  from  the  authenticity  of  the  underlying  ORAM  scheme  which  guarantees  that  the  extractor  never 
puts  any  incorrect  value  into  the  array  C. 

Lemma  1.  Within  the  execution  of  ExtGame^  £  (A,  p),  we  have: 

Pr[£^"(Cfin,l£,lp)  0  {M,  faill5  fail2}]  <  negl(A). 

Proof  of  Lemma.  The  only  way  that  the  above  bad  event  can  occur  is  if  the  extractor  puts  an  incorrect  value 
into  its  array  C  which  does  not  match  encoded  version  of  the  correct  memory  contents  M.  In  particular,  this 

10The  failure  event  faili  and  the  choice  of  8  is  only  intended  to  simplify  the  analysis  of  the  extractor.  The  only  real  bad  event 
from  which  the  extractor  cannot  recover  is  f a  i 1 2  - 


> 


P'(  A) 


(2) 
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means  that  one  of  the  audit  protocol  executions  (consisting  of  an  ORead  with  t  random  locations)  initiated 
by  the  extractor  £  between  the  malicious  server  5fm  and  the  client  Cfm  causes  the  client  to  output  some 
incorrect  value  which  does  not  match  correct  memory  contents  M,  and  not  reject.  By  the  correctness  of  the 
ORAM  scheme,  this  means  that  the  the  malicious  server  must  have  deviated  from  honest  behavior  during 
that  protocol  execution,  without  the  client  rejecting.  Assume  the  probability  of  this  bad  event  happening  is 
p.  Since  the  extractor  runs  s  =  max(2t?cocje,  A)  ■  p  =  poly  (A)  such  protocol  executions  with  rewinding ,  there 
is  at  least  p/s  =  /o/ poly  (A)  probability  that  the  above  bad  event  occurs  on  a  single  random  execution  of  the 
audit  with  5fjn.  But  this  means  that  S  can  be  used  to  break  the  authenticity  of  ORAM  with  advantage 
p/poly(A),  by  first  running  the  requested  protocol  sequence  P  and  then  deviating  from  honest  behavior 
during  a  subsequent  ORead  protocol  without  being  detected.  Therefore,  by  the  authenticity  of  ORAM,  we 
must  have  p  =  negl(A).  □ 

Combining  the  above  with  (2)  we  get: 


Pr 


Succ(5fin)  >  ^ 
A£^fin(Cfin,  1£,  P)  G  {faili,  fail2} 


> 


p'(  A) 


-  negl(A) 


(3) 


Extractor  can  indeed  only  fail  with  f a i 1 2 •  Next,  we  refine  equation  (3)  and  claim  that  the  extractor  is 
unlikely  to  reach  the  failure  event  faili  and  therefore  must  fail  with  f a i 1 2 • 


Succ(5fin)  > 
A£*n(Cfin,l£,lp)  =fail2 


1 

p'(  A) 


negl(A) 


(4) 


To  prove  the  above,  it  suffices  to  prove  the  following  lemma,  which  intuitively  says  that  if  5f;n  has  a  good 
chance  of  passing  an  audit,  then  the  extractor  must  be  able  to  extract  sufficiently  many  values  inside  C 
and  hence  cannot  output  faili.  Remember  that  faili  occurs  if  the  extractor  does  not  have  enough  values  to 
recover  the  whole  memory,  and  fail2  occurs  if  the  extractor  does  not  have  enough  values  to  recover  some 
message  block. 

Lemma  2.  For  any  (even  inefficient)  machine  5fjn  and  any  polynomial  p  =  p( A)  we  have: 

Pr[£^fin (Cfin,  1£,  P)  =  faili  |  Succ(5fin)  >  1/p ]  <  negl(A). 


Proof  of  Lemma.  Let  E  be  the  bad  event  that  faili  occurs.  For  each  iteration  i  G  [s]  within  step  (2)  of  the 

execution  of  £  let  us  define: 

•  X{  to  be  an  indicator  random  variable  that  takes  on  the  value  Xt  =  1  iff  the  ORead  protocol  execution 
in  iteration  i  does  not  reject. 

•  Gi  to  be  a  random  variable  that  denotes  the  subset  {j  G  [fcode]  :  C  [j]  _L}  of  filled-in  positions  in  the 

current  version  of  C  at  the  beginning  of  iteration  i. 

•  Yi  to  be  an  indicator  random  variable  that  takes  on  the  value  Y)  =  1  iff  |Crj|  <  <5  •  £Code  and  all  of  the 
locations  that  £  chooses  to  read  in  iteration  i  happen  to  satisfy  j\, ... ,  jt  G  Gt . 

If  Xi  =  1  and  Yi  =  0  in  iteration  i,  then  at  least  one  position  of  C  gets  filled  in  so  |Gj+i|  >  \Gi\  +  1. 

Therefore  the  bad  event  E  only  occurs  if  fewer  than  <%?code  °f  the  Xi  take  on  a  1  or  at  least  one  Yi  takes  on 

a  1,  giving  us: 


Pr[E]  <  Pr 


S 

J2Xi<Sl code 
.1=1 


+  ^Pr[P8  =  l] 
2—1 
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For  each  i,  we  can  bound  Pr [Yj  =  1]  <  (^c°deJ)  / (^“de)  <  <5*.  If  we  define  X  =  ~s  X^i=i  -X?  we  also  get: 


Pr 


Y^Xt<5(c  ode 


U=l 


<  Pr 


y<i/p-(i/p-—  i 

s 


<  exp(-2s(l/p  -  5£code/s)2) 

<  exp (—s/p)  <  2_a 

where  the  second  inequality  follows  by  the  Chernoff-Hoeffding  bound.  Therefore  Pr[E]  <  2~A  +  s<5*  =  negl(A) 
which  proves  the  lemma.  □ 


Use  Estimated  Success  Probability.  Instead  of  looking  at  the  true  success  probability  Succ(5fjn), 
which  we  cannot  efficiently  compute,  let  us  instead  consider  an  estimated  probability  Succ(5fjn)  which  is 
computed  in  the  context  of  ExtGame  by  sampling  2A(p(A))2  different  “audit  protocol  executions”  between 
5fin  and  Cf\n  and  seeing  on  which  fraction  of  them  does  S  succeed  (while  rewinding  5f[n  and  Cfin  after  each 
one).  Then,  by  the  Chernoff-Hoeffding  bound,  we  have: 


Pr 


SuCc(cSfin)  < 


2p(A) 


Succ(cSfin)  > 


pW\ 


<  e  A  =  negl(A) 


Combining  the  above  with  (4),  we  get: 


SuCc(iSfjn)  >  2p(\) 
A£*"(Cfin,  !*,!*)  =  fail2 


1 

P'(  A) 


negl(A) 


(5) 


Assume  Passive  Attacker.  We  now  argue  that  we  can  replace  the  active  attacker  S  with  an  efficient 
passive  attacker  S  who  always  acts  as  the  honest  server  S  in  each  protocol  execution  within  the  protocol 
sequence  P  and  the  subsequent  audit,  but  can  selectively  fail  by  outputting  _L  at  any  point.  In  particular 
S  just  runs  a  copy  of  S  and  the  honest  server  S  concurrently,  and  if  S  deviates  from  the  execution  of  S,  it 
just  outputs  _L.  Then  we  claim  that,  within  the  context  of  ExtGame^  we  have: 


Pr 


Succ(5fin)  > 
A^n(Cfin,^,U)  =fail2 


1 

p'{  A) 


negl(A) 


(6) 


The  above  probability  is  equivalent  for  S  and  S,  up  to  the  latter  deviating  from  the  protocol  execution  with¬ 
out  being  detected  by  the  client,  either  during  the  protocol  execution  of  P  or  during  one  of  the  polynomially 
many  executions  of  the  next  read  used  to  compute  Succ(5)  and  £s .  The  probability  that  this  occurs  is 
negligible,  by  authenticity  of  ORAM. 


Permuted  Extractor.  We  now  aim  to  derive  a  contradiction  from  (6).  Intuitively,  if  faih  occurs  (but 
fa i 1 1  does  not),  it  means  that  there  is  some  codeword  block  c j  such  that  5f[n  is  significantly  likelier  to  fail  on 
a  next-read  query  for  which  at  least  one  location  falls  inside  c j,  than  it  is  for  a  “random”  read  query.  This 
would  imply  an  attack  on  next-read  pattern  hiding.  We  now  make  this  intuition  formal.  Consider  a  modified 
“permuted  extractor”  £perm  who  works  just  like  £  with  the  exception  that  it  permutes  the  locations  used  in 
the  ORead  executions  during  the  extraction  process.  In  particular  Tperm  makes  the  following  modifications 
to  £\ 

•  At  the  beginning,  £perm  chooses  a  random  permutation  n  :  [£code]  — >  (4ode]- 

•  During  each  of  the  s  iterations  of  the  audit  protocol,  £perm  chooses  t  indices  £  Decode]  at  random 

as  before,  but  it  then  runs  ORead(7r(ji), . . . ,  ir (jt))  on  the  permuted  values.  If  the  protocol  is  accepting 
the  extractor  £perm  still  “fills-in”  the  original  locations:  C[ji], . . . ,  C \jt\  (since  we  are  only  analyzing  the 
event  failo  we  do  not  care  about  the  values  in  these  locations  but  only  if  they  are  filled  in  or  not). 


14 

9.  Dynamic  Proofs  of  Retrievability  via  Oblivious  RAM 


Now  we  claim  that  an  execution  of  ExtGame  the  permuted  extractor  <fperm  is  still  likely  to  result  in  the  failure 
event  f a  i  1 2  -  This  follows  from  “next-read  pattern  hiding”  which  ensures  that  permuting  the  locations  inside 
of  the  ORead  executions  (with  rewinding)  is  indistinguishable. 


Lemma  3.  The  following  holds  within  ExtGame^  £ 


Pr 


Succ(5fjn)  >  2p(\) 
A£perm  (Cfin ,  1*,  1P)  =  fail2 


1 

>  P'( A) 


negl(A) 


(7) 


Proof  of  Lemma.  Assume  that  (7)  does  not  hold.  Then  we  claim  that  there  is  an  adversary  A  with  non- 
negligible  distinguishing  advantage  in  NextReadGame^(A)  against  the  ORAM. 

The  adversary  A  runs  S  who  chooses  a  PoR  protocol  sequence  P\  =  (opo, . . .  ,opq2),  and  A  translates 
this  to  the  appropriate  ORAM  protocol  sequence,  as  defined  by  the  PORAM  scheme.  Then  A  chooses  its 
own  sequence  P2  =  (ropi, . . . ,  ropq2)  of  sufficiently  many  read  operations  ORead(fi, ...  At)  where  i\, . . .  ,it  £ 
[£code]  are  random  distinct  indices.  It  then  passes  Pi,  P2  to  its  challenger  and  gets  back  the  transcripts  of 
the  protocol  executions  for  stages  (I)  and  (II)  of  the  game. 

The  adversary  A  then  uses  the  client  communication  from  the  stage  (I)  transcript  to  run  S,  getting  it  into 
some  state  <Sfjn.  It  then  uses  the  stage  (II)  transcripts,  to  compute  £Sfm(Cfm,  le,  lp)  =  fai I2  and  to  estimate 
Succ(5fin),  without  knowing  the  client  state  Cf[n.  It  does  so  just  by  checking  on  which  executions  does  5fm 
abort  with  _L  and  which  it  runs  to  completion  (here  we  use  that  S  is  semi-honest  and  never  deviates  beyond 
outputting  _L).  Lastly  A  outputs  1  iff  the  emulated  extraction  £Sfm  {Cfm,  1(,  lp)  =  faih  and  Succ(5fjn)  >  2p(\)  • 
Let  b  be  the  challenger’s  bit  in  the  “next-read  pattern  hiding  game”.  If  b  =  0  (not  permuted)  then 
A  perfectly  emulates  the  distribution  of  £^fin(Cfjn,  1^,  lp)  =  fa i I2  and  the  estimation  of  Succ(5fjn)  so,  by 
inequality  (6): 

Pr[NextReadGame^(A)  =  1]  >  l/p'(X)  —  negl(A). 

If  b  =  1  (permuted)  then  A  perfectly  emulates  the  distribution  of  the  permuted  extractor  £p|rnm(Cfin- 1 ' .  lp)  = 
fa i I2  and  the  estimation  of  Succ(5fin)  since,  for  the  latter,  it  does  not  matter  whether  random  reads  are 
permuted  or  not.  Therefore,  since  (7)  is  false  by  assumption,  we  have 

Pr[NextReadGame^(A)  =  1]  <  l/p'(X)  —  p(X) 


where  p( A)  is  non-negligible.  This  means  that  the  distinguishing  advantage  of  the  passive  attacker  A  is 
non-negligible  in  the  next-read  pattern  hiding  game,  which  proves  the  lemma.  □ 


Contradiction.  Finally,  we  present  an  information-theoretic  argument  showing  that,  when  using  the 
permuted  extractor  £perm)  the  probability  of  faih  is  negligible  over  the  choice  of  the  permutation  7 r.  Together 
with  inequality  (7),  this  gives  us  a  contradiction. 

Lemma  4.  For  any  (possibly  unbounded)  S,  we  have 


Pr 


Succ(<Sfin)  > 

A£pei-nm (Cfin ,  1£,  lp)  =  fail2 


negl(A). 


Proof  of  Lemma.  Firstly,  note  that  an  equivalent  way  of  thinking  about  £perm  is  to  have  it  issue  random  (un¬ 
permuted)  read  queries  just  like  £  to  recover  C,  but  then  permute  the  locations  of  C  via  some  permutation 
7T  :  [fcode]  — >  [fcode]  before  testing  for  the  event  f a i 1 2  •  This  is  simply  because  we  have  the  distributional 
equivalence  (7r (random),  random)  =  (random,  7r(random)),  where  random  represents  the  randomly  chosen 
locations  for  the  audit  and  m  is  a  random  permutation.  Now,  with  this  interpretation  of  £perm>  the  event  fa i I2 
occurs  only  if  (I)  the  un-permuted  C  contains  more  than  6  fraction  of  locations  with  filled  in  (non  _L)  values 
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so  that  fa i 1 1  does  not  occur,  and  (II)  the  permuted  version  (ci, . . . ,  c l)  =  C[7r(l)], . . . ,  C[7r(fcode)]  contains 
some  codeword  block  c j  with  fewer  than  k/n  fraction  of  hlled  in  (non  _L)  values. 

We  now  show  that,  conditioned  on  (I)  the  probability  of  (II)  is  negligible  over  the  random  choice  of  ir. 
Fix  some  index  j  £  [L]  and  let  us  bound  the  probability  that  Cj  is  the  “bad”  codeword  block  with  fewer  than 
k  filled  in  values.  Let  X\,X-2,  .  ■  ■  ,Xn  be  random  variables  where  Xj  is  1  if  c j[i\  ^  _L  and  0  otherwise.  Let 

X  b  Y^i= i  Xi-  Then,  over  the  randomness  of  7 r,  the  random  variables  X\. . . .  ,Xn  are  sampled  without 
replacement  from  a  population  of  fCode  values  (location  in  C),  at  least  <M?code  of  which  are  1  (/  _L)  and  the 
rest  are  0  (=  _L).  Therefore,  by  Hoeffding’s  bound  for  sampling  from  finite  populations  without  replacement 
(See  section  6  of  [18]),  we  have: 

Pr[cj  is  bad  ]  =  Pr[A  <  k/n]  =  Pr[X  <  5  —  (5  —  k/n)] 

<  exp(— 2  n(5  —  k/n)2)  =  negl(A) 

By  taking  a  union-bound  over  all  codeword  blocks  Cj,  we  can  bound  the  probability  in  equation  (7)  by 

Yfj= l  PrIci  is  bad  ]  ^  negKA). 

We  have  already  shown  that  faili  only  occurs  with  negligible  probability.  We  now  showed  that  faiL  for  the 
permuted  extractor  also  occurs  with  negligible  probability,  while  the  adversary  succeeds  with  non-negligible 
probability.  □ 

Combining  the  above  lemma  with  equation  (7),  we  get  a  contradiction,  showing  that  the  assumption 
in  equation  (1)  cannot  hold.  Thus,  as  long  as  the  adversary  succeeds  with  non-negligible  probability  dur¬ 
ing  audits,  the  extractor  will  also  succeed  with  non-negligible  probability  in  extracting  the  whole  memory 
contents  correctly. 

6  ORAM  Instantiation 

The  notion  of  ORAM  was  introduced  by  Goldreich  and  Ostrovsky  [13],  who  also  introduced  the  so-called 
hierarchical  scheme  having  the  structure  seen  in  Figures  1  and  6.2.  Since  then  several  improvements  to  the 
hierarchical  scheme  have  been  given,  including  improved  rebuild  phases  and  the  use  of  advanced  hashing 
techniques  [31,  26,  15]. 

We  examine  a  particular  ORAM  scheme  of  Goodrich  and  Mitzenmacher  [15]  and  show  that  (with  minor 
modifications)  it  satisfies  next-read  pattern  hiding  security.  Therefore,  this  scheme  can  be  used  to  instantiate 
our  PORAM  construction.  We  note  that  most  other  ORAM  schemes  from  the  literature  that  follow  the 
hierarchical  structure  also  seemingly  satisfy  next-read  pattern  hiding,  and  we  only  focus  on  the  above 
example  for  concreteness.  However,  in  Appendix  C,  we  show  that  it  is  not  the  case  that  every  ORAM 
scheme  satisfies  next-read  pattern  hiding,  and  in  fact  give  an  example  of  a  contrived  scheme  which  does  not 
satisfy  this  notion  and  makes  our  construction  of  PORAM  completely  insecure.  We  also  believe  that  there 
are  natural  schemes,  such  as  the  ORAM  of  Shi  et  al.  [28],  which  do  not  satisfy  this  notion.  Therefore, 
next-read  pattern  hiding  is  a  meaningful  property  beyond  standard  ORAM  security  and  must  be  examined 
carefully. 

Overview.  We  note  that  ORAM  schemes  are  generally  not  described  as  protocols,  but  simply  as  a  data 
structure  in  which  the  client’s  encrypted  data  is  stored  on  the  server.  Each  time  that  a  client  wants  to 
perform  a  read  or  write  to  some  address  i  of  her  memory,  this  operation  is  translated  into  a  series  of 
read/write  operations  on  this  data  structure  inside  the  server’s  storage.  In  other  words,  the  (honest)  server 
does  not  perform  any  computation  at  all  during  these  ‘protocols’,  but  simply  allows  the  client  to  access 
arbitrary  locations  inside  this  data  structure. 

Most  ORAM  schemes,  including  the  one  we  will  use  below,  follow  a  hierarchical  structure.  They  maintain 
several  levels  of  hash  tables  on  the  server,  each  holding  encrypted  address-value  pairs,  with  lower  tables 
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having  higher  capacity.  The  tables  are  managed  so  that  the  most  recently  accessed  data  is  kept  in  the  top 
tables  and  the  least  recently  used  data  is  kept  in  the  bottom  tables.  Over  time,  infrequently  accessed  data 
is  moved  into  lower  tables  (obliviously). 

To  write  a  value  to  some  address,  just  insert  the  encrypted  address-value  pair  in  the  top  table.  To  read 
the  value  at  some  address,  one  hashes  the  address  and  checks  the  appropriate  position  in  the  top  table.  If  it 
is  found  in  that  table,  then  one  hides  this  fact  by  sequentially  checking  random  positions  in  the  remaining 
tables.  If  it  is  not  found  in  the  top  table,  then  one  hashes  the  address  again  and  checks  the  second  level  table, 
continuing  down  the  list  until  it  is  found,  and  then  accessing  random  positions  in  the  remaining  tables.  Once 
all  of  the  tables  have  been  accessed,  the  found  data  is  written  into  the  top  table.  To  prevent  tables  from 
overflowing  (due  to  too  many  item  insertions),  there  are  additional  periodic  rebuild  phases  which  obliviously 
moves  data  from  the  smaller  tables  to  larger  tables  further  down. 

Security  Intuition.  The  reason  that  we  always  write  found  data  into  the  top  table  after  any  read,  is 
to  protect  the  privacy  of  repeatedly  reading  the  same  address,  and  ensuring  that  this  looks  the  same  as 
reading  various  different  addresses.  In  particular,  reading  the  same  address  twice  will  not  need  to  access  the 
same  locations  on  the  server,  since  after  the  first  read,  the  data  will  already  reside  in  the  top  table,  and  the 
random  locations  will  be  read  at  lower  tables. 

At  any  point  in  time,  after  the  server  observes  many  read/write  executions,  any  subsequent  read  operation 
just  accesses  completely  random  locations  in  each  table,  from  the  point  of  view  of  the  server.  This  is  the 
main  observation  needed  to  argue  standard  pattern  hiding.  For  next-read  pattern  hiding,  we  notice  that 
we  can  extend  the  above  to  any  set  of  q  distinct  executions  of  a  subsequent  read  operation  with  distinct 
addresses  (each  execution  starting  in  the  same  client/server  state).  In  particular,  each  of  the  q  operations 
just  accesses  completely  random  locations  in  each  table,  independently  of  the  other  operations,  from  the 
point  of  view  of  the  server. 

One  subtlety  comes  up  when  the  addresses  are  not  completely  distinct  from  each  other,  as  is  the  case  in 
our  definition  where  each  address  can  appear  in  multiple  separate  multi-read  operations.  The  issue  is  that 
doing  a  read  operation  on  the  same  address  twice  with  rewinding  will  reveal  the  level  at  which  the  data 
for  that  address  is  stored,  thus  revealing  some  information  about  which  address  is  being  accessed.  One  can 
simply  observe  at  which  level  do  the  accesses  begin  to  differ  in  the  two  executions.  We  fix  this  issue  by 
modifying  a  scheme  so  that,  instead  of  accessing  freshly  chosen  random  positions  in  lower  tables  once  the 
correct  value  is  found,  we  instead  access  pseudorandom  positions  that  are  determined  by  the  address  being 
read  and  the  operation  count.  That  way,  any  two  executions  which  read  the  same  address  starting  from  the 
same  client  state  are  exactly  the  same  and  do  not  reveal  anything  beyond  this.  Note  that,  without  state 
rewinds,  this  still  provides  regular  pattern  hiding. 

6.1  Technical  Tools 

Our  construction  uses  the  standard  notion  of  a  pseudorandom-function  (PRF)  where  F(K,x )  denote  the 
evaluation  of  the  PRF  F  on  input  x  with  key  K .  We  also  rely  on  a  symmetric-key  encryption  scheme  secure 
against  chosen-plaintext  attacks ,  and  let  Enc(/i,  •),  Deceit,  •)  denote  the  encryption/decryption  algorithms 
with  key  K. 

Encrypted  cuckoo  table.  An  encrypted  cuckoo  table  [25,  20]  consists  of  three  arrays  (Tf,  T2,  S )  that  hold 
ciphertexts  of  some  fixed  length.  The  arrays  7j  and  T2  are  both  of  size  m  and  serve  as  cuckoo-hash  tables 
while  S  is  an  array  of  size  s  and  serves  as  an  auxiliary  stash.  The  data  structure  uses  two  hash  functions 
hi ,  h‘2  :  \f\  — >•  [m] .  Initially,  all  entries  of  the  arrays  are  populated  with  independent  encryptions  of  a 

special  symbol  X.  To  retrieve  a  ciphertext  associated  with  an  address  i,  one  decrypts  all  of  the  ciphertexts 
in  S,  as  well  as  the  ciphertexts  at  7j  [h\  [i]]  and  T2 [ha [?']]  (thus  at  most  s  +  2  decryptions  are  performed).  If 
any  of  these  ciphertexts  decrypts  to  a  value  of  the  form  (i,v),  then  v  is  the  returned  output.  To  insert 
an  address- value  pair  (i,v),  encrypt  it  and  write  the  ciphertext  ct  to  position  Ti[h\(i)],  retrieving  whatever 
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ciphertext  cti  was  there  before.  If  the  original  ciphertext  cti  decrypts  to  _L,  then  stop.  Otherwise,  if  cti 
decrypts  to  a  pair  ( j,w ),  then  re-encrypt  the  pair  and  write  the  resulting  ciphertext  to  again 

retrieving  whatever  ciphertext  ct2  was  there  before.  If  ct2  decrypts  to  _L,  then  stop,  and  otherwise  continue 
this  process  iteratively  with  ciphertexts  ct3,  ct4, . . ..  If  this  process  continues  for  t  =  clogn  steps,  then  ‘give 
up’  and  just  put  the  last  evicted  ciphertext  ctt  into  the  first  available  spot  in  the  stash  S.  If  S  is  full,  then 
the  data  structure  fails. 

We  will  use  the  following  result  sketched  in  [15]:  If  m  =  (1  +  e)n  for  some  constant  e  >  0,  and  hi,  /12  are 
random  functions,  then  after  n  items  are  inserted,  the  probability  that  S  has  k  or  more  items  written  into  it 
is  0(l/nk+2).  Thus,  if  S  has  at  least  A  slots,  then  the  probability  of  a  failure  after  n  insertions  is  negligible 
in  A. 


Oblivious  table  rebuilds.  We  will  assume  an  oblivious  protocol  for  the  following  task.  At  the  start  of  the 
protocol,  the  server  holds  encrypted  cuckoo  hash  tables  C\, . . . ,  Cr.  The  client  has  two  hash  functions  hi,  I12 . 
After  the  oblivious  interaction,  the  server  holds  a  new  cuckoo  hash  table  C'r  that  results  from  decrypting 
the  data  in  C\, . . . ,  Cr,  deleting  data  for  duplicated  locations  with  preference  given  to  the  copy  of  the  data 
in  the  lowest  index  table,  encrypting  each  index-value  pair  again,  and  then  inserting  the  ciphertexts  into  C'r 
using  h\ ,  hr2  ■ 

Implementing  this  task  efficiently  and  obliviously  is  an  intricate  task.  See  [15]  and  [26]  for  different 
methods,  which  adapt  the  usage  of  oblivious  sorting  first  introduced  in  [13]. 


6.2  ORAM  Scheme 


We  can  now  describe  the  scheme  of  Goodrich  et  al,  with  our  modifications  for  next-read  pattern  hiding. 
As  ingredients,  this  scheme  will  use  a  PRF  F  and  an  encryption  scheme  (Enc,  Dec).  A  visualization  of  the 
server’s  data  structures  is  given  in  Figure  6.2. 

Olnit(lA,  lw ,  £):  Let  L  the  smallest  integer  such  that  2L  >  £.  The  client  chooses  2 L  random  keys  Kip, 
Kip,  •  •  • ,  Kl  \,  Kjj  2  and  2 L  additional  random  keys  Rip,  Rip,  ■  ■  ■ ,  Rl, 1,  Rl, 2  to  be  used  for  pseudo¬ 
random  functions,  and  initializes  a  counter  ctr  to  0.  It  also  selects  an  encryption  key  for  the  IND-CPA 
secure  scheme.  It  instructs  the  server  to  allocate  the  following  data  structures: 


•  An  empty  array  Aq  that  will  change  size  as  it  is  used. 

•  L  empty  cuckoo  hash  tables  C\, . . . ,  Cl  where  the  parameters  in  Cj  are  adjusted  to  hold  2 J  data 
items  with  a  negligible  (in  A)  probability  of  overflow  when  used  with  random  hash  functions. 


The  client  state  consists  of  all  of  the  keys  (Kjt 0,  Kjt i)jem, 


(Rj,o,  Rj,i)jerL-\,  the  encryption  key,  and  ctr. 


ORead(*i, . . .  ,it):  The  client  starts  by  initializing  an  array  found  of  t  flags  to  false.  For  each  index  ij  to 

be  read,  the  client  does  the  following.  For  each  level  k  =  1, . . . ,  L,  the  client  executes 

•  Let  Ck  =  {T[k\T^k\s^) 

•  If  found  [j]  =  false,  read  and  decrypt  all  of  S^k\  T^k>  [F(Kf,.p,  ij)]  and  T^lFiyK^^Aj)]-  If  the  data 
is  in  any  of  these  slots,  set  found  [j]  to  true  and  remember  the  value  as  vj. 

•  Else,  if  found  [7]  =  true,  then  instead  read  all  of  S(k\  T^\F{Rk^,ij  ||ctr)]  and  [F^R^p,  ij  ||ctr)] 
and  ignore  the  results.  Note  that  the  counter  value  is  used  to  create  random  reads  when  the  state 
is  not  reset,  while  providing  the  same  random  values  if  the  state  is  reset. 


Finally,  it  encrypts  and  appends  ( ij,Vj )  to  the  end  of  Aq  and  continues  to  the  next  index  ij+i-  We  note 
that  above,  when  accessing  a  table  using  the  output  of  F.  we  are  interpreting  the  bit  string  output  by 
F  as  a  random  index  from  the  appropriate  range. 

After  all  the  indices  have  been  read  and  written  to  Aq,  the  client  initiates  a  rebuild  phase,  the  description 
of  which  we  defer  for  now. 
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Figure  2:  Server  data  structures  in  the  ORAM  instantiation. 


Rebuild  phase.  We  complete  the  scheme  description  by  describing  a  rebuild  phase,  which  works  as  follows. 
The  client  repeats  the  following  process  until  Aq  is  empty: 

•  Increment  ctr. 

•  Remove  and  decrypt  an  item  from  Aq.  calling  the  result  (j,v). 

•  Let  r  >  0  be  the  largest  integer  such  that  2r  divides  (ctr  mod  2L). 

•  Select  new  keys  /ir,i,ATr,2  and  use  the  functions  F(Jir, i,-)  and  F{Kr^,-)  as  h\  and  to  obliviously 
build  a  new  cuckoo  table  C'r  holding  the  removed  item  (j,  v)  and  all  of  the  data  items  in  C\, ,  Cr-i, 
freshly  re-encrypted  and  with  duplicates  removed. 

•  Then,  for  j  =  1  to  r  —  1,  set  Kjt i,  Kj $  to  fresh  random  keys  and  set  the  cuckoo  tables  C\, . . . ,  Cr  to  be 
new,  empty  tables  and  Cr  to  be  C'r. 

Note  that  the  remaining  tables  Cr+ i,  . . . ,  Cl  are  not  touched. 

We  can  implement  the  rebuild  phase  using  the  any  of  the  protocols  (with  small  variations)  from  [15,  16]. 
The  most  efficient  gives  an  amortized  overhead  of  log  l  operations  for  all  rebuilds,  assuming  that  the  client 
can  temporarily  locally  store  Is  memory  slots  during  the  protocol  (but  the  client  does  need  to  store  them 
between  executions  of  the  protocol).  If  we  only  allow  the  client  to  store  a  constant  number  of  slots  at  any 
one  time,  then  the  we  incur  an  overhead  of  log2  A  In  either  case  the  worst-case  overhead  is  0(1).  Using 
the  de- amortization  techniques  from  [16,  24],  we  can  achieve  worst-case  complexity  of  log2£,  at  the  cost  of 
doubling  the  server  storage.  This  technique  was  analyzed  in  the  original  ORAM  security  setting,  but  it  is 
not  hard  to  extend  our  proof  to  show  that  it  preserves  next-read  pattern  hiding  as  well. 

6.3  Next-Read  Pattern  Hiding 

Theorem  2.  Assuming  that  F  is  a  secure  PRF,  and  the  underlying  encryption  scheme  is  chosen-plaintext 
secure,  then  the  scheme  O  described  above  is  next-read  pattern  hiding. 

Proof.  We  show  that  for  any  efficient  adversary  A,  the  probabilities  that  A  outputs  1  when  playing  either 
NextReadGame0^  or  NextReadGame1^  differs  by  only  a  negligible  amount.  In  these  games,  the  adversary  A 
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provides  two  tuples  of  operations  Pi  =  (op\ , . . . ,  opqi)  and  P2  =  {ropi, . . . ,  ropq2),  the  latter  being  all  multi¬ 
reads,  and  a  permutation  7 r  on  [£] .  Then  in  NextReadGame0^,  A  is  given  the  transcript  of  an  honest  client 
and  server  executing  Pi,  as  well  as  the  transcript  of  executing  the  multi-reads  in  P2  with  rewinds  after  each 
operation,  while  in  NextReadGame^  it  is  given  the  same  transcript  except  that  second  part  is  generated  by 
first  permuting  the  addresses  in  P>  according  to  7 r. 

We  need  to  argue  that  these  inputs  are  computationally  indistinguishable.  For  our  analysis  below,  we 
assume  that  a  rebuild  phase  never  fails,  as  this  event  happens  with  negligible  probability  in  A,  as  discussed 
before.  We  start  by  modifying  the  execution  of  the  games  in  two  ways  that  are  shown  to  be  undetectable  by 
A.  The  first  change  will  show  that  all  of  the  accesses  into  tables  appear  to  the  adversary  to  be  generated  by 
random  functions,  and  the  second  change  will  show  that  the  ciphertexts  do  not  reveal  any  usable  information 
for  the  adversary. 

First,  whenever  keys  Kjti,  Kj^  are  chosen  and  used  with  the  function  F ,  we  use  random  functions  gh \ ,  gh2 
in  place  of  F(Kj q,  •)  and  F(Kjt 2,  -).11  We  do  the  same  for  the  Rjp,  Rjt 2  keys,  calling  the  random  functions 
7j.  1  and  Tji.  This  change  only  changes  the  behavior  of  A  by  a  negligible  amount,  as  otherwise  we  could 
build  a  distinguisher  to  contradict  the  PRF  security  of  F  via  a  standard  hybrid  argument  over  all  of  the 
keys  chosen  during  the  game. 

The  second  change  we  make  is  that  all  of  the  ciphertexts  in  the  transcript  are  replaced  with  independent 
encryptions  of  equal-length  strings  of  zeros.  We  claim  that  this  only  affects  the  output  distribution  of  A  by 
a  negligible  amount,  as  otherwise  we  could  build  an  adversary  to  contradict  the  IND-CPA  security  of  the 
underlying  encryption  scheme  via  a  standard  reduction.  Here  it  is  crucial  that,  after  each  rewind,  the  client 
chooses  new  randomness  for  the  encryption  scheme. 

We  now  complete  the  proof  by  showing  that  the  distribution  of  the  transcripts  given  to  A  is  identical 
in  the  modified  versions  of  NextReadGame0^  and  NextReadGame^.  To  see  why  this  is  true,  let  us  examine 
what  is  in  one  of  the  game  transcripts  given  to  A.  The  transcript  for  the  execution  of  P\  consists  of  ORead 
and  O Write  transcripts,  which  are  accesses  to  indices  in  the  cuckoo  hash  tables,  ciphertext  writes  into  Aq, 
and  rebuild  phases.  Finally  the  execution  of  P2  (either  permuted  by  it  or  not)  with  rewinds  generates  a 
transcript  that  consists  of  several  accesses  to  the  cuckoo  hash  tables,  each  followed  by  writes  to  Aq  and  a 
rebuild  phase. 

By  construction  of  the  protocol,  in  the  modified  game  the  only  part  of  the  transcript  that  depends 
on  the  addresses  in  P2  are  the  reads  into  T^''1  and  for  each  k.  All  other  parts  of  the  transcript  are 
oblivious  scans  of  the  arrays  and  oblivious  table  rebuilds  which  do  not  depend  on  the  addresses  (recall 
the  ciphertexts  in  these  transcripts  are  encryptions  of  zeros).  Thus  we  focus  on  the  indices  read  in  each 
and  T2v  ,  and  need  to  show  that,  in  the  modified  games,  the  distribution  of  these  indices  does  not  depend 
on  the  addresses  in  P2. 

The  key  observation  is  that,  after  the  execution  of  Pi,  the  state  of  the  client  is  such  that  each  address 
i  will  induce  a  uniformly  random  sequence  of  indices  in  the  tables  that  is  independent  of  the  indices  read 
for  any  other  address  and  independent  of  the  transcript  for  Pi.  If  the  data  is  in  the  cuckoo  table  at  level  k , 
then  the  indices  will  be 

and  (rjAi\\ctr))f=k+i- 

Thus  each  i  induces  a  random  sequence,  and  each  address  will  generate  an  independent  sequence.  We  claim 
moreover  that  the  sequence  for  i  is  independent  of  the  transcript  for  P\.  This  follows  from  the  construction: 
For  the  indices  derived  from  rjt  1  and  7^2,  the  transcript  for  Pi  would  have  always  used  a  lower  value  for  ctr. 
For  the  indices  derived  from  gjt  1  and  5^2,  we  have  that  the  execution  of  Pi  would  not  have  evaluated  those 
functions  on  input  i:  If  i  was  read  during  Pi,  then  i  would  have  been  written  to  Aq  and  a  rebuild  phase 
would  have  chosen  new  random  functions  for  gj\  and  gjt 2  before  the  address/ value  pair  i  was  placed  in  the 

11  As  usual,  instead  of  actually  picking  and  using  a  random  function,  which  is  an  exponential  task,  we  create  random  numbers 
whenever  necessary,  and  remember  them.  Since  there  will  be  only  polynomially-many  interactions,  this  only  requires  polynomial 
time  and  space. 
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j-th  level  table  again. 

With  this  observation  we  can  complete  the  proof.  When  the  modified  games  are  generating  the  transcript 
for  the  multi-read  operations  in  P-2,  each  individual  read  for  an  index  i  induces  an  random  sequence  of 
table  reads  among  its  other  oblivious  operations.  But  since  each  i  induces  a  completely  random  sequence 
and  permuting  the  addresses  will  only  permute  the  random  sequences  associated  with  the  addresses,  the 
distribution  of  the  transcript  is  unchanged.  Thus  no  adversary  can  distinguish  these  games,  which  means 
that  no  adversary  could  distinguish  NextReadGame0^  and  NextReadGame^,  as  required.  □ 

6.4  Authenticity,  Extensions  Optimizations 

Authenticity.  To  achieve  authenticity  we  sketch  how  to  employ  the  technique  introduced  in  [13].  A 
straightforward  attempt  is  to  tag  every  ciphertext  stored  on  the  server  along  with  its  location  on  the  server 
using  a  message  authentication  code  (MAC).  But  this  fails  because  the  sever  can  “roll  back”  changes  to  the 
data  by  replacing  ciphertexts  with  previously  stored  ones  at  the  same  location.  We  can  generically  fix  this 
by  using  the  techniques  of  memory  checking  [5,  23,  11]  at  some  additional  logarithmic  overhead.  However, 
it  also  turns  out  that  authenticity  can  also  be  added  at  almost  no  cost  to  several  specific  constructions,  as 
we  describe  below. 

Goldreich  and  Ostrovsky  showed  that  any  ORAM  protocol  supporting  time  labeled  simulation  (TLS)  can 
be  modified  to  achieve  authenticity  without  much  additional  complexity.  We  say  that  an  ORAM  protocol 
supports  TLS  if  there  exists  an  efficient  algorithm  Q  such  that,  after  the  j-th  message  is  sent  to  the  server, 
for  each  index  x  on  the  server  memory,  the  number  of  times  x  has  been  written  to  is  equal  to  Q(j,  x).12 
Overall,  one  implements  the  above  tagging  strategy,  and  also  includes  Q(j,x)  with  the  data  being  tagged, 
and  when  reading  one  recomputes  Q(j,x )  to  verify  the  tag. 

Our  scheme  can  be  shown  to  support  TLS  in  a  manner  very  similar  to  the  original  hierarchical  scheme  [13]. 
The  essential  observation,  also  used  there,  is  that  the  table  indices  are  only  written  to  during  a  rebuild  phase, 
so  by  tracking  the  number  of  executed  rebuild  phases  we  can  compute  how  many  times  each  index  of  the 
table  was  written  to. 

Extensions  and  optimizations.  The  scheme  above  is  presented  in  a  simplified  form  that  can  be  made 
more  efficient  in  several  ways  while  maintaining  security. 

•  The  keys  in  the  client  state  can  be  derived  from  a  single  key  by  appropriately  using  the  PRF.  This 
shrinks  the  client  state  to  a  single  key  and  counter. 

•  The  initial  table  C\  can  be  made  larger  to  reduce  the  number  of  rebuild  phases  (although  this  does  not 
affect  the  asymptotic  complexity). 

•  We  can  collapse  the  individual  oblivious  table  rebuilds  into  one  larger  rebuild. 

•  It  was  shown  in  [17]  that  all  of  the  L  cuckoo  hash  tables  can  share  a  single  0(A)-size  stash  S  while  still 
maintaining  a  negligible  chance  of  table  failure. 

•  Instead  of  doing  table  rebuilds  all  at  once,  we  can  employ  a  technique  that  allows  for  them  to  be  done 
incrementally,  allowing  us  to  achieve  worst-case  rather  than  amortized  complexity  guarantees  [16,  24]. 
These  techniques  come  at  the  cost  of  doubling  the  server  storage. 

•  The  accesses  to  cuckoo  tables  on  each  level  during  a  multi-read  can  be  done  in  parallel,  which  reduces 
the  round  complexity  of  that  part  to  be  independent  of  t,  the  number  of  addresses  being  read. 

We  can  also  extend  this  scheme  to  support  a  dynamically  changing  memory  size.  This  is  done  by  simply 
allocating  different  sized  tables  during  a  rebuild  that  eliminate  the  lower  larger  tables  or  add  new  ones  of 
the  appropriate  size.  This  modification  will  achieve  next-read  pattern  hiding  security,  but  it  will  not  be 
standard  pattern-hiding  secure,  as  it  leaks  some  information  about  the  number  of  memory  slots  in  use.  One 

12Here  we  mean  actual  writes  on  the  server,  and  not  OWrite  executions. 
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can  formalize  this,  however,  in  a  pattern-hiding  model  where  any  two  sequences  with  equal  memory  usage 
are  required  to  be  indistinguishable. 

Efficiency.  In  this  scheme  the  client  stores  the  counter  and  the  keys,  which  can  be  derived  from  a  single 
key  using  the  PRF.  The  server  stores  log£  tables,  where  the  j-th  table  requires  2J  +  A  memory  slots,  which 
sums  to  0(£  +  A  •  log  £) .  Using  the  optimization  above,  we  only  need  a  single  stash,  reducing  the  sum  to 
0(1  +  A).  When  executing  ORead,  each  index  read  requires  accessing  two  slots  plus  the  A  stash  slots  in  each 
of  the  logf  tables,  followed  by  a  rebuild.  OWrite  is  simply  one  write  followed  by  a  rebuild  phase.  The  table 
below  summarizes  the  efficiency  measures  of  the  scheme. 


Client  Storage 

0(1) 

Server  Storage 

0(1  +  A) 

Read  Complexity 

0(A-log£)  +  RP 

Write  Complexity 

0(1)  +RP 

Table  1:  Efficiency  of  ORAM  scheme  above.  “RP”  denotes  the  aggregate  cost  of  the  rebuild  phases,  which 
is  0(\og£),  or  0( log2  £)  in  the  worst-case,  per  our  discussion  above. 


7  Efficiency 

We  now  look  at  the  efficiency  of  our  PORAM  construction,  when  instantiated  with  the  ORAM  scheme  from 
section  6  (we  assume  the  rebuild  phases  are  implemented  via  the  Goodrich-Mitzenracher  algorithm  [15]  with 
the  worst-case  complexity  optimization  [16,  24].)  Since  our  PORAM  scheme  preserves  (standard)  ORAM 
security,  we  analyze  its  efficiency  in  two  ways.  Firstly,  we  look  at  the  overhead  of  PORAM  scheme  on  top  of 
just  storing  the  data  inside  of  the  ORAM  without  attempting  to  achieve  any  PoR  security  (e.g.,  not  using 
any  error-correcting  code  etc.).  Secondly,  we  look  at  the  overall  efficiency  of  PORAM.  Third,  we  compare 
it  with  dynamic  PDP  [12,  30]  which  does  not  employ  erasure  codes  and  does  not  provide  full  retrievability 
guarantee.  In  the  table  below,  £  denotes  the  size  of  the  client  data  and  A  is  the  security  parameter.  We 
assume  that  the  ORAM  scheme  uses  a  PRF  whose  computation  takes  O(A)  work. 


PORAM  Efficiency 

vs.  ORAM 

Overall 

vs.  Dynamic  PDP  [12] 

Client  Storage 

Same 

0(A) 

Same 

Server  Storage 

x  0(1) 

0(£) 

x  0(1) 

Read  Complexity 

x  0(1) 

0(  A  log2  £) 

x  0(log£) 

Write  Complexity 

x  0(A) 

~o{ A^Mog^rr 

x  0(A  x  log£) 

Audit  Complexity 

Read  x  0(A) 

0(A2  x  log1  £) 

x  O(logf) 

By  modifying  the  underlying  ORAM  to  dynamically  resize  tables  during  rebuilds,  the  resulting  PORAM 
instantiation  will  achieve  the  same  efficiency  measures  as  above,  but  with  £  taken  to  be  amount  of  memory 
currently  used  by  the  memory  access  sequence.  This  is  in  contrast  to  the  usual  ORAM  setting  where  £  is 
taken  to  be  a  (perhaps  large)  upper  bound  on  the  total  amount  of  memory  that  will  ever  be  used. 
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A  Simple  Dynamic  PoR  with  Square-Root  Complexity 


We  sketch  a  very  simple  construction  of  dynamic  PoR  that  achieves  sub-linear  complexity  in  its  read,  write 
and  audit  operations.  Although  the  scheme  is  asymptotically  significantly  worst  then  our  PORAM  solution 
as  described  in  the  main  body,  it  is  significantly  simpler  and  may  be  of  interest  for  some  practical  parameter 
settings. 

The  construction  starts  with  the  first  dynamic  PoR  proposal  from  the  introduction.  To  store  a  memory 
M  G  on  the  server,  the  client  divides  it  into  L  =  y/i  consecutive  message  blocks  (mi,..,, m^),  each 
containing  L  =  \fi  symbols.  The  client  then  encodes  each  of  the  message  blocks  m;  using  an  (n  =  2 L,  k  = 
L,d  =  L  +  l)-erasure  code  (e.g.,  Reed-Solomon  tolerating  L  erasures),  to  form  a  codeword  block  c $,  and 
concatenates  the  codeword  blocks  to  form  a  string  C  =  (ci, . . .  ,c l)  G  which  it  then  stores  on  the  server. 
We  can  assume  the  code  is  systematic  so  that  the  message  block  m,  resides  in  the  first  L  symbols  of  the 
corresponding  codeword  block  c,.  In  addition,  the  client  initializes  a  memory  checking  scheme  [5,  23,  11], 
which  it  uses  to  authenticate  each  of  the  21  codeword  symbols  within  C. 

To  read  a  location  j  G  [£]  of  memory,  the  client  computes  the  index  i  G  [L]  of  the  message  block  m, 
containing  that  location,  and  downloads  the  appropriate  symbol  of  the  codeword  block  c *  which  contains 
the  value  M [j ]  (here  we  use  that  the  code  is  systematic),  which  it  checks  for  authenticity  via  the  memory 
checking  scheme.  To  write  to  a  location  j  G  [i\  the  client  downloads  the  entire  corresponding  codeword  block 
c i  (checking  for  authenticity)  decodes  m, ,  changes  the  appropriate  location  to  get  an  updated  block  m(  and 
finally  re-encodes  it  to  get  c'  which  it  then  writes  to  the  server,  updating  the  appropriate  authentication 
information  within  the  memory  checking  scheme.  The  audit  protocol  selects  t  =  A  (security  parameter) 
random  positions  within  every  codeword  block  c*  and  checks  them  for  authenticity  via  the  memory  checking 
scheme. 

The  read  and  write  protocols  of  this  scheme  each  execute  the  memory  checking  read  protocol  to  read 
and  write  1  and  y/I  symbols  respectively.  The  audit  protocol  reads  and  checks  Av/f  symbols.  Assuming  an 
efficient  (poly- logarithmic)  memory  checking  protocol,  this  means  actual  complexity  of  these  protocols  incurs 
another  O(logf)  factor  and  another  constant  factor  increase  in  server  storage.  Therefore  the  complexity  of 
the  reads,  writes,  and  audit  is  0(1),  0(yfl),  0(\/I)  respectively,  ignoring  factors  that  depend  on  the  security 
parameter  or  are  polylogarithmic  in  i. 

Note  that  the  above  scheme  actually  gives  us  a  natural  trade-off  between  the  complexity  of  the  writes 
and  the  audit  protocol.  In  particular,  for  any  (5  >  0,  we  can  set  the  message  block  size  to  L\  =  symbols, 
so  that  the  client  memory  M  now  consists  of  L2  =  such  blocks.  In  this  case,  the  complexity  of  reads, 
writes,  and  audits  becomes  0(1),  0(£s),  0(f1_<5)  respectively. 

B  Standard  Pattern  Hiding  for  ORAM 

We  recall  an  equivalent  definition  to  the  one  introduced  by  Goldreich  and  Ostrovsky  [13].  Informally, 
standard  pattern  hiding  says  that  an  (arbitrarily  malicious  and  efficient)  adversary  cannot  detect  which 
sequence  of  instructions  a  client  is  executing  via  the  ORAM  protocols. 

Formally,  for  a  bit  b  and  an  adversary  A.  we  define  the  game  ORAMGame^(A)  as  follows: 

•  The  attacker  M(1A)  outputs  two  equal-length  ORAM  protocol  sequences  Qo  =  (opo, . . .  ,opq),Q\  = 
( op'0l . . .  ,op'q).  We  require  that  for  each  index  j,  the  operations  opj  and  op'j  only  differ  in  the  location 
they  access  and  the  values  the  are  writing,  but  otherwise  correspond  to  the  same  operation  (read  or 
write) . 

•  The  challenger  initializes  an  honest  client  C  and  server  S ,  and  sequentially  executes  the  operations  in 
Qb,  between  C  and  S. 

•  Finally,  A  is  given  the  complete  transcript  of  all  the  protocol  executions,  and  he  outputs  a  bit  b,  which 
is  the  output  of  the  game. 
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We  say  that  an  ORAM  protocol  is  pattern  hiding  if  for  all  efficient  adversaries  A  we  have: 

|Pr[ORAMGame^(A)  =  1]  -  Pr[ORAMGame^(A)  =  1]|  <  negl(A). 

Sometimes  we  also  want  to  achieve  a  stronger  notion  of  security  where  we  also  wish  to  hide  whether  each 
operation  is  a  read  or  a  write.  This  can  be  done  generically  by  always  first  executing  a  read  for  the  desired 
location  and  then  executing  a  write  to  either  just  write-back  the  read  value  (when  we  only  wanted  to  do  a 
read)  or  writing  in  a  new  value. 

C  Standard  ORAM  Security  Does  not  Suffice  for  PORAM 

In  this  section  we  construct  an  ORAM  that  is  secure  in  the  usual  sense  but  is  not  next-read  pattern  hiding. 
In  fact,  we  will  show  something  stronger:  If  the  ORAM  below  were  used  to  instantiate  our  PORAM  scheme 
then  the  resulting  dynamic  PoR  scheme  is  not  secure.  This  shows  that  some  notion  of  security  beyond 
regular  ORAM  is  necessary  for  the  security  PORAM. 

Counterexample  construction.  We  can  take  any  ORAM  scheme  (e.g.,  the  one  in  Section  6  for  concrete¬ 
ness)  and  modify  it  by  “packing”  multiple  consecutive  logical  addresses  into  a  single  slot  of  the  ORAM.  In 
particular,  if  the  client  initializes  the  modified  ORAM  (called  MORAM  within  this  section)  with  alphabet 
E  =  {0, 1}™,  it  will  translate  this  into  initializing  the  original  ORAM  with  the  alphabet  En  =  {0,  l}nw , 
where  each  symbol  in  the  modified  alphabet  “packs”  together  n  symbols  of  the  original  alphabet.  Assume 
this  is  the  same  n  as  the  codeword  length  in  our  PORAM  protocol. 

Whenever  the  client  wants  to  read  some  address  i  using  MORAM,  the  modified  scheme  looks  up  where  it 
was  packed  by  computing  j  =  [i/n\,  uses  the  original  ORAM  scheme  to  execute  ORead(j),  and  then  parses 
the  resulting  output  as  (vq,...  ,  un_i)  £  Sn,  and  returns  vt  mo(j  n.  To  write  v  to  address  i,  MORAM  runs 
ORAM  scheme’s  ORead([i/nJ)  to  get  (no, . . .  ,un_i)  as  before,  then  sets  V{  moci  n  4-  u  and  writes  the  data 
back  via  ORAM  scheme’s  OWrite(  [i/nj ,  (uo, . . .  ,un_i)).  It  is  not  hard  to  show  that  this  modified  scheme 
retains  standard  ORAM  security,  since  it  hides  which  locations  are  being  read/written. 

We  next  discuss  why  this  modification  causes  the  MORAM  to  not  be  NRPH  secure.  Consider  what 
happens  if  the  client  issues  a  read  for  an  address,  say  i  =  0,  and  then  is  rewound  and  reads  another  address 
that  was  packed  into  the  same  ORAM  slot,  say  i  + 1.  Both  operations  will  cause  the  client  to  issue  ORead(O). 
And  since  our  MORAM  was  deterministic,  the  client  will  access  exactly  same  table  indices  at  every  level  on 
the  server  on  both  runs.  But,  if  these  addresses  were  permuted  to  not  be  packed  together  (e.g.,  blocks  were 
packed  using  equivalence  classes  of  their  indices  (  mod  i/ri)),  then  the  client  will  issue  ORead  commands 
on  different  addresses,  reading  different  table  positions  (with  high  probability),  thus  allowing  the  server  to 
distinguish  which  case  it  was  in  and  break  NRPH  security. 

This  establishes  that  the  modified  scheme  is  not  NRPH  secure.  To  see  why  PORAM  is  not  secure  with 
MORAM,  consider  an  adversary  that,  after  a  sequence  of  many  read/write  operations,  randomly  deletes  one 
block  of  its  storage  (say,  from  the  lowest  level  cuckoo  table).  If  this  block  happens  to  contain  a  non-dummy 
ciphertext  that  contains  actual  data  (which  occurs  with  reasonable  probability),  then  this  attack  corresponds 
to  deleting  some  codeword  block  in  full  (because  all  codeword  blocks  corresponding  to  a  message  block  was 
packed  in  the  same  ORAM  storage  location),  even  though  the  server  does  not  necessarily  know  which  one. 
Therefore,  the  underlying  message  block  can  never  be  recovered  from  the  attacker.  But  this  adversary  can 
still  pass  an  audit  with  good  probability,  because  the  audit  would  only  catch  the  adversary  if  it  happened 
to  access  the  deleted  block  during  its  reads  either  by  (1)  selecting  exactly  this  location  to  check  during  the 
audit,  (2)  reading  this  location  in  the  cuckoo  table  slot  as  a  dummy  read.  This  happens  with  relatively  low 
probability,  around  1/f,  where  i  is  the  number  of  addresses  in  the  client  memory. 

To  provide  some  more  intuition,  we  can  also  examine  why  this  same  attack  (deleting  a  random  location 
in  the  lowest  level  cuckoo  table)  does  not  break  PORAM  when  instantiated  with  the  ORAM  implementation 
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from  Section  6  that  is  NRPH  secure.  After  this  attack,  the  adversary  still  maintains  a  good  probability  of 
passing  a  subsequent  audit.  However,  by  deleting  only  a  single  ciphertext  in  one  of  the  cuckoo  tables,  the 
attacker  now  deleted  only  a  single  codeword  symbol,  not  a  full  block  of  n  of  them.  And  now  we  can  show 
that  our  extractor  can  still  recover  enough  of  the  other  symbols  of  the  codeword  block  so  that  the  erasure 
code  will  enable  recovery  of  the  original  data.  Of  course,  the  server  could  start  deleting  more  of  the  locations 
in  the  lowest  level  cuckoo  table,  but  he  cannot  selectively  target  codeword  symbols  belonging  to  a  single 
codeword  block,  since  it  has  no  idea  where  those  reside.  If  he  starts  to  delete  too  many  of  them  just  to  make 
sure  a  message  block  is  not  recoverable,  then  he  will  lose  his  ability  to  pass  an  audit. 
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Abstract 

The  Short  Integer  Solution  (SIS)  and  Learning  With  Errors  (LWE)  problems  are  the  foundations  for 
countless  applications  in  lattice-based  cryptography,  and  are  provably  as  hard  as  approximate  lattice 
problems  in  the  worst  case.  A  important  question  from  both  a  practical  and  theoretical  perspective  is  how 
small  their  parameters  can  be  made,  while  preserving  their  hardness. 

We  prove  two  main  results  on  SIS  and  LWE  with  small  parameters.  For  SIS,  we  show  that  the  problem 
retains  its  hardness  for  moduli  q  >  ft  ■  ns  for  any  constant  S  >  0,  where  ft  is  the  bound  on  the  Euclidean 
norm  of  the  solution.  This  improves  upon  prior  results  which  required  q  >  ft  •  yj n  log  n,  and  is  essentially 
optimal  since  the  problem  is  trivially  easy  for  q  <  ft.  For  LWE,  we  show  that  it  remains  hard  even  when 
the  errors  are  small  (e.g.,  uniformly  random  from  {0, 1}),  provided  that  the  number  of  samples  is  small 
enough  (e.g..  linear  in  the  dimension  n  of  the  LWE  secret).  Prior  results  required  the  errors  to  have 
magnitude  at  least  y/n  and  to  come  from  a  Gaussian-like  distribution. 


1  Introduction 

In  modem  lattice-based  cryptography,  two  average-case  computational  problems  serve  as  the  foundation 
of  almost  all  cryptographic  schemes:  Short  Integer  Solution  (SIS),  and  Learning  With  Errors  (LWE).  The 
SIS  problem  dates  back  to  Ajtai’s  pioneering  work  [1],  and  is  defined  as  follows.  Let  n  and  q  be  integers, 
where  n  is  the  primary  security  parameter  and  usually  q  =  poly(n),  and  let  ft  >  0.  Given  a  uniformly 
random  matrix  A  G  Z”xm  for  some  m  =  poly(n),  the  goal  is  to  find  a  nonzero  integer  vector  z  e  Zm 
such  that  Az  =  0  mod  q  and  ||z||  <  ft  (where  ||-||  denotes  Euclidean  norm).  Observe  that  ft  should  be 
set  large  enough  to  ensure  that  a  solution  exists  (e.g.,  ft  >  y/n  log  q  suffices),  but  that  ft  >  q  makes  the 
problem  trivially  easy  to  solve.  Ajtai  showed  that  for  appropriate  parameters,  SIS  enjoys  a  remarkable 
worst-case/average-case  hairiness  property:  solving  it  on  the  average  (with  any  noticeable  probability)  is  at 
least  as  hard  as  approximating  several  lattice  problems  on  n-dimensional  lattices  in  the  worst  case,  to  within 
poly(n)  factors. 
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The  LWE  problem  was  introduced  in  the  celebrated  work  of  Regev  [24],  and  has  the  same  parameters  n 
and  q,  along  with  a  “noise  rate”  a  G  (0, 1).  The  problem  (in  its  search  form)  is  to  find  a  secret  vector 
s  G  Z”,  given  a  “noisy”  random  linear  system  A  G  Z”xm,  b  =  A1  s  +  e  mod  q,  where  A  is  uniformly 
random  and  the  entries  of  e  are  i.i.d.  from  a  Gaussian-like  distribution  with  standard  deviation  roughly  aq. 
Regev  showed  that  as  long  as  aq  >  2  y/n,  solving  LWE  on  the  average  (with  noticeable  probability)  is  at 
least  as  hard  as  approximating  lattice  problems  in  the  worst  case  to  within  0(n/a)  factors  using  a  quantum 
algorithm.  Subsequently,  Peikert  [21]  gave  a  classical  reduction  for  a  subset  of  the  lattice  problems  and  the 
same  approximation  factors,  but  under  the  additional  condition  that  q  >  2n/2  (or  q  >  2 y/n /a  based  on  some 
non-standard  lattice  problems). 

A  significant  line  of  research  has  been  devoted  to  improving  the  tightness  of  worst-case/average-case 
connections  for  lattice  problems.  For  SIS,  a  series  of  works  [1,  7,  14,  19,  12]  gave  progressively  better 
parameters  that  guarantee  hardness,  and  smaller  approximation  factors  for  the  underlying  lattice  problems. 
The  state  of  the  art  (from  [12],  building  upon  techniques  introduced  in  [19])  shows  that  for  q  >  ft-u(y/n  log  n), 
finding  a  SIS  solution  with  norm  bounded  by  ft  is  as  hard  as  approximating  worst-case  lattice  problems  to 
within  0{fty/n)  factors.  (The  parameter  m  does  not  play  any  significant  role  in  the  hardness  results,  and 
can  be  any  polynomial  in  n.)  For  LWE,  Regev’s  initial  result  remains  the  tightest,  and  the  requirement  that 
q  >  y/n /a  (i.e.,  that  the  errors  have  magnitude  at  least  y/n)  is  in  some  sense  optimal:  a  clever  algorithm 
due  to  Arora  and  Ge  [2]  solves  LWE  in  time  so  a  proof  of  hardness  for  substantially  smaller  errors 

would  imply  a  subexponential  time  (quantum)  algorithm  for  approximate  lattice  problems,  which  would  be  a 
major  breakthrough.  Interestingly,  the  current  modulus  bound  for  LWE  is  in  some  sense  better  than  the  one 
for  SIS  by  a  (l(y/n)  factor:  there  are  applications  of  LWE  for  1/a  =  0(1)  and  hence  q  =  0(y/n),  whereas 
SIS  is  only  useful  for  ft  >  y/n,  and  therefore  requires  q  >  n  according  to  the  state-of-the-art  reductions. 

Further  investigating  the  smallest  parameters  for  which  SIS  and  LWE  remain  provably  hard  is  important 
from  both  a  practical  and  theoretical  perspective.  On  the  practical  side,  improvements  would  lead  to 
smaller  cryptographic  keys  without  compromising  the  theoretical  security  guarantees,  or  may  provide  greater 
confidence  in  more  practical  parameter  settings  that  so  far  lack  provable  hardness.  Also,  proving  the  hardness 
of  LWE  for  non-Gaussian  error  distributions  (e.g.,  uniform  over  a  small  set)  would  make  applications  easier 
to  implement.  Theoretically,  improvements  may  eventually  shed  light  on  related  problems  like  Learning 
Parity  with  Noise  (LPN),  which  can  be  seen  as  a  special  case  of  LWE  for  modulus  q  =  2,  and  which  is 
widely  used  in  coding-based  cryptography,  but  which  has  no  known  proof  of  hardness. 

1.1  Our  Results 

We  prove  two  complementary  results  on  the  hardness  of  SIS  and  LWE  with  small  parameters.  For  SIS,  we 
show  that  the  problem  retains  its  hardness  for  moduli  q  nearly  equal  to  the  solution  bound  ft.  For  LWE,  we 
show  that  it  remains  hard  even  when  the  errors  arc  small  (e.g.,  uniformly  random  from  {0, 1}),  provided  that 
the  number  m  of  noisy  equations  is  small  enough.  This  qualification  is  necessary  in  light  of  the  Arora-Ge 
attack  [2],  which  for  large  enough  m  can  solve  LWE  with  binary  errors  in  polynomial  time.  Details  follow. 

SIS  with  small  modulus.  Our  first  theorem  says  that  SIS  retains  its  hardness  with  a  modulus  as  small  as 
q  >  ft  ■  n'\  for  any  6  >  0.  Recall  that  the  best  previous  reduction  [12]  required  q  >  ft  ■  u(y/n  logn),  and  that 
SIS  becomes  trivially  easy  for  q  <  ft,  so  the  q  obtained  by  our  proof  is  essentially  optimal.  It  also  essentially 
closes  the  gap  between  LWE  and  SIS,  in  terms  of  how  small  a  useful  modulus  can  be.  More  precisely,  the 
following  is  a  special  case  of  our  main  SIS  hardness  theorem;  see  Section  3  for  full  details. 
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Theorem  1.1  (Corollary  of  Theorem  3.8).  Let  n  and  m  =  poly(n)  be  integers,  let  (3  >  Boo  >  1  be  reals, 
let  Z  =  { z  6  Zm  :  ||z||2  <  (3  and  ||z <  Boo},  and  let  q>  /3  ■  ns  for  some  constant  5  >  0.  Then  solving 
(on  the  average,  with  non-negligible  probability)  SIS  with  parameters  n,  to,  q  and  solution  set  Z  \  {0}  is 
at  least  as  hard  as  approximating  lattice  problems  in  the  worst  case  on  n-dimensional  lattices  to  within 
7  =  max{l,  (3  ■  Boo /q]  ■  0(/3y/n)  factors. 

Of  course,  the  bound  on  the  SIS  solutions  can  be  easily  removed  simply  setting  Boo  =  (3,  so  that 
1 1  z  1 1  oo  <  1 1  z  1 1 2  <  (3  automatically  holds  true.  We  include  an  explicit  bound  Boo  <  (3  in  order  to  obtain 
more  precise  hardness  results,  based  on  potentially  smaller  worst-case  approximation  factors  7.  We  point  out 
that  the  bound  Boo  and  the  associated  extra  term  max{l,  [3  ■  Boo  / Q }  in  the  worst-case  approximation  factor 
is  not  present  in  previous  results.  Notice  that  this  term  can  be  as  small  as  1  (if  we  take  q  >  (3  ■  Boo,  and  in 
particular  if  Boo  <  ns),  and  as  large  as  B/rB  (if  Boo  =  B).  This  may  be  seen  as  the  first  theoretical  evidence 
that,  at  least  when  using  a  small  modulus  q,  restricting  the  i ^  norm  of  the  solutions  may  make  the  SIS 
problem  qualitatively  harder  than  just  restricting  the  1 2  norm.  There  is  already  significant  empirical  evidence 
for  this  belief:  the  most  practically  efficient  attacks  on  SIS,  which  use  lattice  basis  reduction  (e.g.,  [11,  8]), 
only  find  solutions  with  bounded  (B  norm,  whereas  combinatorial  attacks  such  as  [5,  25]  (see  also  [20])  or 
theoretical  lattice  attacks  [9]  that  can  guarantee  an  bound  are  much  more  costly  in  practice,  and  also 
require  exponential  space.  Finally,  we  mention  that  setting  Boo  -C  (3  is  very  natural  in  the  usual  formulations 
of  one-way  and  collision-resistant  hash  functions  based  on  SIS,  where  collisions  correspond  (for  example) 
to  vectors  in  {  —  1,0, 1}"',  and  therefore  have  bound  Boo  =  L  but  (f  bound  B  =  fm.  Similar  gaps 
between  Boo  and  B  can  easily  be  enforced  in  other  applications,  e.g.,  digital  signatures  [12]. 

LWE  with  small  errors.  In  the  case  of  LWE,  we  prove  a  general  theorem  offering  a  trade-off  among 
several  different  parameters,  including  the  size  of  the  errors,  the  dimension  and  number  of  samples  in  the 
LWE  problem,  and  the  dimension  of  the  underlying  worst-case  lattice  problems.  Here  we  mention  just  one 
instantiation  for  the  case  of  prime  modulus  and  uniformly  distributed  binary  (i.e.,  0-1)  errors,  and  refer  the 
reader  to  Section  4  and  Theorem  4.6  for  the  more  general  statement  and  a  discussion  of  the  parameters. 

Theorem  1.2  (Corollary  of  Theorem  4.6).  Let  n  and  m  =  fr -  (1  +  12(1/  log  n))  be  integers,  and  q  >  v()' 1 
a  sufficiently  large  polynomially  bounded  (prime)  modulus.  Then  solving  LWE  with  parameters  n,  m,  q  and 
independent  uniformly  random  binary  errors  (i.e.,  in  {0, 1})  is  at  least  as  hard  as  approximating  lattice 
problems  in  the  worst  case  on  @(n/  log  n)- dimensional  lattices  within  a  factor  7  =  0(  yTi  •  q). 

We  remark  that  our  results  (see  Theorem  4.6)  apply  to  many  other  settings,  including  error  vectors  eel 
chosen  from  any  (sufficiently  large)  subset  X  C  {0,  l}m  of  binary  strings,  as  well  as  error  vectors  with 
larger  entries.  Interestingly,  our  hardness  result  for  LWE  with  very  small  errors  relies  on  the  worst-case 
hardness  of  lattice  problems  in  dimension  n'  =  0(n/  logn),  which  is  smaller  than  (but  still  quasi-linear 
in)  the  dimension  n  of  the  LWE  problem;  however,  this  is  needed  only  when  considering  very  small  error 
vectors.  Theorem  4.6  also  shows  that  if  e  is  chosen  uniformly  at  random  with  entries  bounded  by  ne  (which 
is  still  much  smaller  than  \/n),  then  the  dimension  of  the  underlying  worst-case  lattice  problems  (and  the 
number  to.  —  n  of  extra  samples,  beyond  the  LWE  dimension  n)  can  be  linear  in  n. 

The  restriction  that  the  number  of  LWE  samples  to  =  0(n )  be  linear  in  the  dimension  of  the  secret  can 
also  be  relaxed  slightly.  But  some  restriction  is  necessary,  because  LWE  with  small  errors  can  be  solved 
in  polynomial  time  when  given  an  arbitrarily  large  polynomial  number  of  samples.  We  focus  on  linear 
to.  =  0(n)  because  this  is  enough  for  most  (but  not  all)  applications  in  lattice  cryptography,  including 
identity-based  encryption  and  fully  homomorphic  encryption,  when  the  parameters  are  set  appropriately. 
(The  one  exception  that  we  know  of  is  the  security  proof  for  pseudorandom  functions  [3].) 
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1.2  Techniques  and  Comparison  to  Related  Work 

Our  results  for  SIS  and  LWE  are  technically  disjoint,  and  all  they  have  in  common  is  the  goal  of  proving 
hardness  results  for  smaller  values  of  the  parameters.  So,  we  describe  our  technical  contributions  in  the 
analysis  of  these  two  problems  separately. 

SIS  with  small  modulus.  For  SIS,  as  a  warm-up,  we  first  give  a  proof  for  a  special  case  of  the  problem 
where  the  input  is  restricted  to  vectors  of  a  special  form  (e.g.,  binary  vectors).  For  this  restricted  version  of 
SIS,  we  are  able  to  give  a  self-reduction  (from  SIS  to  SIS)  which  reduces  the  size  of  the  modulus.  So,  we  can 
rely  on  previous  worst-case  to  average-case  reductions  for  SIS  as  “black  boxes,”  resulting  in  an  extremely 
simple  proof.  However,  this  simple  self-reduction  has  some  drawbacks.  Beside  the  undesirable  restriction  on 
the  SIS  inputs,  our  the  reduction  is  rather  loose  with  respect  to  the  underlying  worst-case  lattice  approximation 
problem:  in  order  to  establish  the  hardness  of  SIS  with  small  moduli  q  (and  restricted  inputs),  one  needs 
to  assume  the  worst-case  hardness  of  lattice  problems  for  rather  large  polynomial  approximation  factors. 
(By  contrast,  previous  hardness  results  for  larger  moduli  [19,  12]  only  assumed  hardness  for  quasi-linear 
approximation  factors.)  We  address  both  drawbacks  by  giving  a  direct  reduction  from  worst-case  lattice 
problems  to  SIS  with  small  modulus.  This  is  our  main  SIS  result,  and  it  combines  ideas  from  previous 
work  [19,  12]  with  two  new  technical  ingredients: 

•  All  previous  SIS  hardness  proofs  [1,  7,  14,  19,  12]  solved  worst-case  lattice  problems  by  iteratively 
finding  (sets  of  linearly  independent)  lattice  vectors  of  shorter  and  shorter  length.  Our  first  new 
technical  ingredient  (inspired  by  the  pioneering  work  of  Regev  [24]  on  LWE)  is  the  use  a  different 
intermediate  problem:  instead  of  finding  progressively  shorter  lattice  vectors,  we  consider  the  problem 
of  sampling  lattice  vectors  according  to  Gaussian-like  distributions  of  progressively  smaller  widths. 
To  the  best  of  our  knowledge,  this  is  the  first  use  of  Gaussian  lattice  sampling  as  an  intermediate 
worst-case  problem  in  the  study  of  SIS,  and  it  appears  necessary  to  lower  the  SIS  modulus  below  n. 
We  mention  that  Gaussian  lattice  sampling  has  been  used  before  to  reduce  the  modulus  in  hardness 
reductions  for  SIS  [12],  but  still  within  the  framework  of  iteratively  finding  short  vectors  (which  in  [12] 
are  used  to  generate  fresh  Gaussian  samples  for  the  reduction),  which  results  in  larger  moduli  q  >  n. 

•  The  use  of  Gaussian  lattice  sampling  as  an  intermediate  problem  within  the  SIS  hardness  proof  yields 
lineal-  combinations  of  several  discrete  Gaussian  samples  with  adversarially  chosen  coefficients.  Our 
second  technical  ingredient,  used  to  analyze  these  linear  combinations,  is  a  new  convolution  theorem 
for  discrete  Gaussians  (Theorem  3.3),  which  strengthens  similar  ones  previously  proved  in  [22,  6], 
Here  again,  the  strength  of  our  new  convolution  theorem  appeal's  necessary  to  obtain  hardness  results 
for  SIS  with  modulus  smaller  than  n. 

Our  new  convolution  theorem  may  be  of  independent  interest,  and  might  find  applications  in  the  analysis  of 
other  lattice  algorithms. 

LWE  with  small  errors.  We  now  move  to  our  results  on  LWE.  For  this  problem,  the  best  provably  hard 
parameters  to  date  were  those  obtained  in  the  original  paper  of  Regev  [24],  which  employed  Gaussian  errors, 
and  required  them  to  have  (expected)  magnitude  at  least  v/n.  These  results  were  believed  to  be  optimal  due 
to  a  clever  algorithm  of  Arora  and  Ge  [2],  which  solves  LWE  in  subexponential  time  when  the  errors  are 
asymptotically  smaller  than  yfn.  The  possibility  of  circumventing  this  barrier  by  limiting  the  number  of  LWE 
samples  was  first  suggested  by  Micciancio  and  Mol  [17],  who  gave  “sample  preserving”  search-to-decision 
reductions  for  LWE,  and  asked  if  LWE  with  small  uniform  errors  could  be  proved  hard  when  the  number 
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of  available  samples  is  sufficiently  small.  Our  results  provide  a  first  answer  to  this  question,  and  employ 
concepts  and  techniques  from  the  work  of  Peikert  and  Waters  [23]  (see  also  [4])  on  lossy  (trapdoor)  functions. 
In  brief,  a  lossy  function  family  is  an  indistinguishable  pair  of  function  families  F .  C  such  that  functions  in 
F  are  injective  and  those  in  C  are  lossy,  in  the  sense  that  they  map  their  common  domain  to  much  smaller 
sets,  and  therefore  lose  information  about  the  input.  As  shown  in  [23],  from  the  indistinguishability  of  F  and 
C,  it  follows  that  the  families  F  and  C  arc  both  one-way. 

In  Section  2  we  present  a  generalized  framework  for  the  study  of  lossy  function  families,  which  does  not 
require  the  functions  to  have  trapdoors,  and  applies  to  arbitrary  (not  necessarily  uniform)  input  distributions. 
While  the  techniques  we  use  arc  all  standard,  and  our  definitions  arc  minor  generalizations  of  the  ones  given 
in  [23],  we  believe  that  our  framework  provides  a  conceptual  simplification  of  previous  work,  relating  the 
relatively  new  notion  of  lossy  functions  to  the  classic  security  definitions  of  second-preimage  resistance  and 
uninvertibility. 

The  lossy  function  framework  is  used  to  prove  the  hardness  of  LWE  with  small  uniform  errors  and 
(necessarily)  a  small  number  of  samples.  Specifically,  we  use  the  standard  LWE  problem  (with  large 
Gaussian  errors)  to  set  up  a  lossy  function  family  F .  II.  (Similar  families  with  trapdoors  were  constructed 
in  [23,  4],  but  not  for  the  parameterizations  required  to  obtain  interesting  hardness  results  for  LWE.)  The 
indistinguishability  of  F  and  C  follows  directly  from  the  hardness  of  the  underlying  LWE  problem.  The 
new  hardness  result  for  LWE  (with  small  errors)  is  equivalent  to  the  one-wayness  of  F.  and  is  proved  by 
a  relatively  standard  analysis  of  the  second-preimage  resistance  and  uninvertibility  of  certain  subset-sum 
functions  associated  to  C. 

Comparison  to  related  work.  In  an  independent  work  that  was  submitted  concurrently  with  ours,  Dottling 
and  Miiller-Quade  [10]  also  used  a  lossyness  argument  to  prove  new  hardness  results  for  LWE.  (Their  work 
does  not  address  the  SIS  problem.)  At  a  syntactic  level,  they  use  LWE  (i.e.,  generating  matrix)  notation  and 
a  new  concept  they  call  “lossy  codes,”  while  here  we  use  SIS  (i.e.,  parity-check  matrix)  notation  and  rely 
on  the  standard  notions  of  uninvertible  and  second-preimage  resistant  functions.  By  the  dual  equivalence  of 
SIS  and  LWE  [15,  17]  (see  Proposition  2.9),  this  can  be  considered  a  purely  syntactic  difference,  and  the 
high-level  lossyness  strategy  (including  the  lossy  function  family  construction)  used  in  [10]  and  in  our  work 
arc  essentially  the  same.  However,  the  low-level  analysis  techniques  and  final  results  arc  quite  different.  The 
main  result  proved  in  [10]  is  essentially  the  following. 

Theorem  1.3  ([10]).  Let  n,q,m  =  n° W  and  r  >  nl^2+e  ■  m  be  integers,  for  an  arbitrary  small  con¬ 
stant  e  >  0.  Then  the  LWE  problem  with  parameters  n,  m,  q  and  independent  uniformly  distributed  errors  in 
{— r, . . . ,  r}m  is  at  least  as  hard  as  (quantumly)  solving  worst-case  problems  on  (n/ 2) -dimensional  lattices 
to  within  a  factor  7  =  n1+e  ■  mq/r. 

The  contribution  of  [10]  over  previous  work  is  to  prove  the  hardness  of  LWE  for  uniformly  distributed 
errors,  as  opposed  to  errors  that  follow  a  Gaussian  distribution.  Notice  that  the  magnitude  of  the  errors  used 
in  [10]  is  always  at  least  fn  ■  m,  which  is  substantially  larger  (by  a  factor  of  m)  than  in  previous  results.  So, 
[10]  makes  no  progress  towards  reducing  the  magnitude  of  the  errors,  which  is  the  main  goal  of  this  paper. 
By  contrast,  our  work  shows  the  hardness  of  LWE  for  errors  smaller  than  f  n  (indeed,  as  small  as  {0, 1}), 
provided  the  number  of  samples  is  sufficiently  small. 

Like  our  work,  [10]  requires  the  number  of  LWE  samples  m  to  be  fixed  in  advance  (because  the  error 
magnitude  r  depends  on  m),  but  it  allows  rn  to  be  an  arbitrary  polynomial  in  n.  This  is  possible  because 
for  the  large  errors  r  7>  y/n  considered  in  [10],  the  attack  of  [2]  runs  in  at  least  exponential  time.  So,  in 
principle,  it  may  even  be  possible  (and  is  an  interesting  open  problem)  to  prove  the  hardness  of  LWE  with 
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(large)  uniform  errors  as  in  [10],  but  for  an  unbounded  number  of  samples.  In  our  work,  hardness  of  LWE 
for  errors  smaller  than  y/n  is  proved  for  a  much  smaller  number  of  samples  m,  and  this  is  necessary  in  order 
to  avoid  the  subexponential  time  attack  of  [2], 

While  the  focus  of  our  work  in  on  LWE  with  small  errors,  we  remark  that  our  main  LWE  hardness  result 
(Theorem  4.6)  can  also  be  instantiated  using  large  polynomial  errors  r  =  n()i  1 -1  to  obtain  any  (linear)  number 
of  samples  m  =  0(n).  In  this  setting,  [10]  provides  a  much  better  dependency  between  the  magnitude  of  the 
errors  and  the  number  of  samples  (which  in  [10]  can  be  an  arbitrary  polynomial).  This  is  due  to  substantial 
differences  in  the  low-level  techniques  employed  in  [10]  and  in  our  work  to  analyze  the  statistical  properties 
of  the  lossy  function  family.  For  these  same  reasons,  even  for  large  errors,  our  results  seem  incomparable  to 
those  of  [10]  because  we  allow  for  a  much  wider  class  of  error  distributions. 

2  Preliminaries 

We  use  uppercase  roman  letters  F,  X  for  sets,  lowercase  roman  for  set  elements  x  £  X,  bold  x  £  X" 
for  vectors,  and  calligraphic  letters  F,  X, . . .  for  probability  distributions.  The  support  of  a  probability 
distribution  X  is  denoted  [X].  The  uniform  distribution  over  a  finite  set  X  is  denoted  U (X). 

Two  probability  distributions  X  and  y  are  (t,  e) -indistinguishable  if  for  all  (probabilistic)  algorithms  V 
running  in  time  at  most  t, 

|Pr[x  F-  X  :  V{x)  accepts]  —  Pr[y  £-  y  :  V{ij)  accepts]  |  <  e. 

2.1  One-Way  Functions 

A  function  family  is  a  probability  distribution  F  over  a  set  of  functions  F  C  (X  -x  Y)  with  common 
domain  X  and  range  Y .  Formally,  function  families  are  defined  as  distributions  over  bit  strings  (function 
descriptions)  together  with  an  evaluation  algorithm,  mapping  each  bitstring  to  a  corresponding  function,  with 
possibly  multiple  descriptions  associated  to  the  same  function.  In  this  paper,  for  notational  simplicity,  we 
identify  functions  and  their  description,  and  unless  stated  otherwise,  all  statements  about  function  families 
should  be  interpreted  as  referring  to  the  corresponding  probability  distributions  over  function  descriptions. 
For  example,  if  we  say  that  two  function  families  F  and  Q  are  indistinguishable,  we  mean  that  no  efficient 
algorithm  can  distinguish  between  function  descriptions  selected  according  to  either  F  or  Q,  where  F  and 
Q  are  probability  distributions  over  bitstrings  that  are  interpreted  as  functions  using  the  same  evaluation 
algorithm. 

A  function  family  F  is  (t,  e)  collision  resistant  if  for  all  (probabilistic)  algorithms  A  running  in  time  at 
most  t, 

Pr[/  F-  F,  (, x , x')  F-  A{f)  :  f(x)  =  f(x')  Ax  A  A  <  e. 

Let  A  be  a  probability  distribution  over  the  domain  A  of  a  function  family  F.  We  recall  the  following 
standard  security  notions: 

•  (A,  A)  is  (t,  e)-one-way  if  for  all  probabilistic  algorithms  A  running  in  time  at  most  t, 

Pr [/  ^F,x^X:  A(f,  /(*))€  /’ 1  (/(*))]  <  e. 

•  (A,  A)  is  [t,  e)-uninvertible  if  for  all  probabilistic  algorithms  A  running  in  time  at  most  t, 

Pr[/  y-  F,  x  F-  A  :  A(f,  f(x))  =  x]  <  e. 
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•  (F,  X)  is  {t.  e)-second preimage  resistant  if  for  all  probabilistic  algorithms  A  running  in  time  at  most  t, 

Pr [/  4—  F,  x  4—  X,  x'  4—  A(f ,  x )  :  f(x)  =  f(x')  A  x  a/]  <  e. 

•  (F,X)  is  ( t ,  e)-pseudorandom  if  the  distributions  {/  4—  J7,  x  4—  <T  :  (/,  /(x))}  and  {/  4-  J7,  y  4— 
Zf(y)  :  (/,  y)}  are  (t,  e)-indistinguishable. 

The  above  probabilities  (or  the  absolute  difference  between  probabilities,  for  indistinguishability)  are 
called  the  advantages  in  breaking  the  corresponding  security  notions.  It  easily  follows  from  the  definition 
that  if  a  function  family  is  one-way  with  respect  to  any  input  distribution  X,  then  it  is  also  uninvertible  with 
respect  to  the  same  input  distribution  X.  Also,  if  a  function  family  is  collision  resistant,  then  it  is  also  second 
preimage  resistant  with  respect  to  any  efficiently  samplable  input  distribution. 

All  security  definitions  arc  immediately  adapted  to  the  asymptotic  setting,  where  we  implicitly  consider 
sequences  of  finite  function  families  indexed  by  a  security  parameter.  In  this  setting,  for  any  security  definition 
(one-wayness,  collision  resistance,  etc.)  we  omit  t,  and  simply  say  that  a  function  is  secure  if  for  any  t  that  is 
polynomial  in  the  security  parameter,  it  is  it.  e) -secure  for  some  e  that  is  negligible  in  the  security  parameter. 
We  say  that  a  function  family  is  statistically  secure  if  it  is  (t.  e) -secure  for  some  negligible  e  and  arbitrary  t, 
i.e.,  it  is  secure  even  with  respect  to  computationally  unbounded  adversaries. 

The  composition  of  function  families  is  defined  in  the  natural  way.  Namely,  for  any  two  function  families 
with  [J7]  C  X  — >  Y  and  [Q]  C  Y  -y  Z,  the  composition  Q  o  J7  is  the  function  family  that  selects  f  4—  F  and 
§4-5  independently  at  random,  and  outputs  the  function  (g  o  f) :  X  — >  Z. 

2.2  Lossy  Function  Families 

Lossy  functions,  introduced  in  [23],  are  usually  defined  in  the  context  of  trapdoor  function  families,  where 
the  functions  arc  efficiently  invertible  with  the  help  of  some  trapdoor  information,  and  therefore  injective  (at 
least  with  high  probability  over  the  choice  of  the  key).  We  give  a  more  general  definition  of  lossy  function 
families  that  applies  to  non-injective  functions  and  arbitrary  input  distributions,  though  we  will  be  mostly 
interested  in  input  distributions  that  are  uniform  over  some  set. 

Definition  2.1.  Let  C.  F  be  two  probability  distributions  (with  possibly  different  supports)  over  the  same  set 
of  (efficiently  computable)  functions  F  C  X  — >•  Y,  and  let  X  be  an  efficiently  sampleable  distribution  over 
the  domain  X.  We  say  that  (. C ,  F .  X)  is  a  lossy  function  family  if  the  following  properties  are  satisfied: 

•  the  distributions  C  and  F  are  indistinguishable, 

•  (C,X)  is  uninvertible,  and 

•  (F,  X)  is  second  preimage  resistant. 

The  uninvertibility  and  second  preimage  resistance  properties  can  be  either  computational  or  statistical. 
(The  definition  from  [23]  requires  both  to  be  statistical.)  We  remark  that  uninvertible  functions  and  second 
preimage  resistant  functions  are  not  necessarily  one-way.  For  example,  the  constant  function  f(x)  =  0  is 
(statistically)  uninvertible  when  \X  \  is  super-polynomial  in  the  security  parameter,  and  the  identity  function 
f(x)  =  x  is  (statistically)  second  preimage  resistant  (in  fact,  even  collision  resistant),  but  neither  is  one-way. 
Still,  if  a  function  family  is  simultaneously  uninvertible  and  second  preimage  resistant,  then  one-wayness 
easily  follows. 

Lemma  2.2.  Let  F  be  a  family  of  functions  computable  in  time  t! .  If  (F,  X)  is  both  ( t ,  e) -uninvertible  and 
(' t  +  t! ,  e')-second preimage  resistant,  then  it  is  also  ( t ,  e  +  e')-one-way. 
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Proof.  Let  A  be  an  algorithm  running  in  time  at  most  t  and  attacking  the  one-wayness  property  of  (F,  X). 
Let  /  <—  F  and  x  v-  X  be  chosen  at  random,  and  compute  y  <—  A(f ,  f(x)).  We  want  to  bound  the 
probability  that  f(x)  =  f(y).  We  consider  two  cases: 

•  If  x  =  y,  then  A  breaks  the  uninvertibility  property  of  (F .  X). 

•  If  x  7^  y,  then  A'(f,  x)  =  A(f,  f(x))  breaks  the  second  preimage  property  of  (F ,  X). 

By  assumption,  the  probability  of  these  two  events  arc  at  most  e  and  e'  respectively.  By  the  union  bound,  A 
breaks  the  one-wayness  property  with  advantage  at  most  e  +  e'.  □ 

It  easily  follows  by  a  simple  indistinguishability  argument  that  if  (C,  F,  X)  is  a  lossy  function  family, 
then  both  {C,X)  and  (F.  X)  are  one-way. 

Lemma  2.3.  Let  F  and  F'  be  any  two  indistinguishable,  efficiently  computable  function  families,  and  let  X 
be  an  efficiently  sampleable  input  distribution.  Then  if  (F,  X)  is  uninvertible  ( respectively ,  second-preimage 
resistant),  then  (F' ,  X)  is  also  uninvertible  (resp.,  second-preimage  resistant).  In  particular,  if(C,F ,  X)  is  a 
lossy  function  family,  then  (£,  X)  and  ( F ,  X)  are  both  one-way. 

Proof.  Assume  that  (F,  X)  is  uninvertible  and  that  there  exists  an  efficient  algorithm  A  breaking  the 
uninvertibility  property  of  (F',X).  Then  F  and  F'  can  be  efficiently  distinguished  by  the  following 
algorithm  V(f):  choose  x  X,  compute  x'  •(—  A(f,  f(x)),  and  accept  if  A  succeeded,  i.e.,  if  x  =  x' . 

Next,  assume  that  (F.  X)  is  second  preimage  resistant,  and  that  there  exists  an  efficient  algorithm  A 
breaking  the  second  preimage  resistance  property  of  (F' ,  X).  Then  F  and  F'  can  be  efficiently  distinguished 
by  the  following  algorithm  'D(f):  choose  x  X,  compute  x1  <—  A(f ,  x),  and  accept  if  A  succeeded,  i.e.,  if 
x  f  x’  and  f(x)  =  f(x'). 

It  follows  that  if  (£,  F ,  X)  is  a  lossy  function  family,  then  (C.  X)  and  (F .  X)  arc  both  uninvertible  and 
second  preimage  resistant.  Therefore,  by  Lemma  2.2,  they  arc  also  one-way.  □ 

The  standard  definition  of  (injective)  lossy  trapdoor  functions  [23],  is  usually  stated  by  requiring  the  ratio 
|/(X)|/|X|  to  be  small.  Our  general  definition  can  easily  be  related  to  the  standard  definition  by  specializing 
it  to  uniform  input  distributions.  The  next  lemma  gives  an  equivalent  characterization  of  uninvertible  functions 
when  the  input  distribution  is  uniform. 

Lemma  2.4.  Let  C  be  a  family  of  functions  on  a  common  domain  X,  and  let  X  =  U  ( X )  the  uniform 
input  distribution  over  X.  Then  {C,X)  is  e-uninvertible  (even  statistically,  with  respect  to  computationally 
unbounded  adversaries)  for  e  =  Ey<_£[|/(A)|]/|A|. 

Proof.  Fix  a  function  /,  and  choose  a  random  input  x  <—  X.  The  best  (computationally  unbounded)  attack 
on  the  uninvertibility  of  (C,X),  given  input  /  and  y  =  fix),  outputs  an  x'  <E  .A  such  that  fix')  =  y  and 
the  probability  of  x'  under  X  is  maximized.  Since  X  is  the  uniform  distribution  over  X,  the  conditional 
distribution  of  x  given  y  is  uniform  over  f~1(y),  and  the  attack  succeeds  with  probability  l/\f~1(y)\.  Each  y 
is  output  by  /  with  probability  |/-1(y)|/|A|.  So,  the  success  probability  of  the  attack  is 

V  1  r\y)\  1  _  \f(x)\ 

^  \x\  |/~1(y)l  l-X'i  ‘ 

yef(X)  1  1  u  1  1 

Taking  the  expectation  over  the  choice  of  /,  we  get  that  the  attacker  succeeds  with  probability  e.  □ 
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We  conclude  this  section  with  the  observation  that  uninvertibility  behaves  as  expected  with  respect  to 
function  composition. 

Lemma  2.5.  If  ( J- .  X)  is  uninvertible  and  Q  is  any  family  of  efficiently  computable  functions,  then  (Q  o  T ,  X) 
is  also  uninvertible. 

Proof  Any  inverter  A  for  Q  o  J  can  be  easily  transformed  into  an  inverter  A' (  f.  y)  for  (jF,  X)  that  chooses 
g  <—  G  at  random,  and  outputs  the  result  of  running  A(g  o  f,  g(y))  □ 

A  similar  statement  holds  also  for  one-wayness,  under  the  additional  assumption  that  Q  is  second  preimage 
resistant,  but  it  is  not  needed  here. 

2.3  Lattices  and  Gaussians 

An  n-dimensional  lattice  of  rank  k  is  the  set  A  of  integer  combinations  of  k  linearly  independent  vectors 
bi, . . . ,  bfc  €  Mn,  i.e.  A  =  I  xi  G  Z  for  *  =  1, . . . ,  fc|.  The  matrix  B  =  [bi, . . . ,  b*.]  is  called 

a  basis  for  the  lattice  A.  The  dual  of  a  (not  necessarily  full-rank)  lattice  A  is  the  set  A*  =  {x  G  span(A)  : 
Vy  G  A,  (x,  y)  G  Z}.  In  what  follows,  unless  otherwise  specified  we  work  with  full-rank  lattices,  where 
k  =  n. 

The  zth  successive  minimum  A*  (A)  is  the  smallest  radius  r  such  that  A  contains  i  linearly  independent 
vectors  of  (Euclidean)  length  at  most  r.  A  fundamental  computational  problem  in  the  study  of  lattice 
cryptography  is  the  approximate  Shortest  Independent  Vectors  Problem  SIVP7,  which,  on  input  a  full-rank 
n-dimensional  lattice  A  (typically  represented  by  a  basis),  asks  to  find  n  linearly  independent  lattice  vectors 
vi, . . . ,  vn  G  A  all  of  length  at  most  7  •  An(A),  where  7  >  1  is  an  approximation  factor  and  is  usually  a 
function  of  the  lattice  dimension  n.  Another  problem  is  the  (decision  version  of  the)  approximate  Shortest 
Vector  Problem  GapSVP7,  which,  on  input  an  n-dimensional  lattice  A,  asks  to  output  “yes”  if  Ai(A)  <  1 
and  “no”  if  Ai(A)  >  7.  (If  neither  is  the  case,  any  answer  is  acceptable.) 

For  a  matrix  B  =  [bi, . . . ,  b*.]  of  linearly  independent  vectors,  the  Gram-Schmidt  orthogonalization  B 
is  the  matrix  of  vectors  b,  where  bi  =  bi,  and  for  each  %  =  2. . . . .  k.  the  vector  bi  is  the  projection  of  b,; 
orthogonal  to  span(bi, . . . ,  bj_i).  The  Gram-Schmidt  minimum  of  a  lattice  A  is  bl( A)  =  minB||B||,  where 
||  B  ||  =  max,;  1 1  bj||  and  the  minimum  is  taken  over  all  bases  B  of  A.  Given  any  basis  D  of  a  lattice  A  and 
any  set  S  of  linearly  independent  vectors  in  A,  it  is  possible  to  efficiently  construct  a  basis  B  of  A  such  that 
||B||  <  ||S||  (see  [16]). 

The  Gaussian  function  ps  :  Mm  — >•  M  with  parameter  s  is  defined  as  ps(x)  =  exp(— 7r||x||2/s2).  When  s 
is  omitted,  it  is  assumed  to  be  1.  The  discrete  Gaussian  distribution  D\+CjS  with  parameter  s  over  a  lattice 
coset  A  +  c  is  the  distribution  that  samples  each  element  x  G  A  +  c  with  probability  ps(x) / ps( A  +  c),  where 
ps( A  +  c)  =  ]PyC=A+c  ps(y)  is  a  normalization  factor. 

For  any  e  >  0,  the  smoothing  parameter  rje( A)  [19]  is  the  smallest  s  >  0  such  that  p\/s{ A*  \  {0})  <  e. 
When  e  is  omitted,  it  is  some  unspecified  negligible  function  e  =  n  of  the  lattice  dimension  or  security 
parameter  n,  which  may  vary  from  place  to  place. 

We  observe  that  the  smoothing  parameter  satisfies  the  following  decomposition  lemma.  The  general  case 
for  the  sum  of  several  lattices  (whose  linear  spans  have  trivial  pairwise  intersections)  follows  immediately  by 
induction. 

Lemma  2.6.  Let  lattice  A  =  Ai  +  A2  be  the  (interned  direct )  sum  of  two  lattices  such  that  span(Ai)  n 
span(A2)  =  {0},  and  let  A2  be  the  projection  of  A2  orthogonal  to  span(Ai).  Then  for  any  ei,  £2,  e  >  0  such 
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that  1  +  e  =  (1  +  ei)(l  +  e2),  we  have 

^e(A2)  <  VeW  <  Ve(M  +  A2)  <  max{?7ei (Ai),  rj£2 (A2)}. 

Proof.  Let  A*,  Aj  and  Aj  be  the  dual  lattices  of  A,  Ai  and  A2,  respectively.  For  the  first  inequality,  notice 
that  Aj  is  a  sublattice  of  A*.  Therefore,  Piys(Aj  \  {0})  <  pljs{A*  \  {0})  for  any  s  >  0,  and  thus 

Ve(A2)  <  rj€{ A). 

Next  we  prove  that  rj€( A)  <  tj€(Ai  +  A2).  It  is  routine  to  verify  that  we  can  express  the  dual  lattice  A* 
as  the  sum  A*  =  Aj  +  Aj,  where  Ai  is  the  projection  of  Ai  orthogonal  to  span(A2),  and  Aj  is  its  dual. 
Moreover,  the  projection  of  Aj  orthogonal  to  span(Aj)  is  exactly  Aj.  For  any  xq  6  Aj,  let  xi  <E  Aj  denote 
its  projection  orthogonal  to  span(Aj).  Then  for  any  s  >  0  we  have 

Pl/s(A*)  =  Y  Y  Pl/s(*l+X2) 

xiSAj  x 2SAj 

=  Y  Y  pi/s(^i)  •  pt/s((xi  -  x0 + X2) 

xieAj  X2GA2 

=  Y  ^t/s(Xl)  '  Pl/s((*l  -  Xt)  +  A2) 

xiSA* 

<  Pt/s(Aj)  •  Pl/S( A*2)  =  p1/s(Aj  +  A*)  =  p1/s((A!  +  A2)*), 

where  the  inequality  follows  from  the  bound  p\/s{A  +  c)  <  pi/s(A)  from  [19,  Lemma  2.9],  and  the  last  two 
equalities  follow  from  the  orthogonality  of  Aj  and  Aj.  This  proves  that  rje( A)  <  ?/f  (A  1  +  A2). 

Finally,  for  si  =  rj€l  (Ai),  s2  =  pe2  (A2)  and  s  =  maxjsi,  s2},  we  have 

Pt/s((At  +  A2)*)  =  Pi/s(Aj)  •  pi/s( Aj)  <  Pi/sl(Aj)  •  p1/S2( Aj)  =  (1  +  £i)(l  +  e2)  =  1  +  e. 

Therefore,  rje( A\  +  Aj)  <  s.  □ 

Using  the  decomposition  lemma,  one  easily  obtains  known  bounds  on  the  smoothing  parameter.  For 
example,  for  any  lattice  basis  B  =  [bi, . . . ,  bn],  applying  Lemma  2.6  repeatedly  to  the  decomposition  into 
the  rank-1  lattices  defined  by  each  of  the  basis  vectors  yields  r/(B  •  Zn)  <  max*  //(b,  •  Z)  =  ||B||  •  un, 
where  un  =  r/(Z)  =  uj{s/\ogn)  is  the  smoothing  parameter  of  the  integer  lattice  Z.  Choosing  a  basis  B 
achieving  bl( A)  =  mine  ||B||  (where  the  minimum  is  taken  over  all  bases  B  of  A),  we  get  the  bound 
77(A)  <  bl{ A)  •  un  from  [12,  Theorem  3.1].  Similarly,  choosing  a  set  S  C  A  of  linearly  independent  vectors 
of  length  ||S||  <  An(A),  we  get  the  bound  77(A)  <  rj(S  ■  Zn)  <  ||S||  •  un  <  ||S||  •  u>n  =  An(A)  •  ujn  from  [19, 
Lemma  3.3].  In  this  paper  we  use  a  further  generalization  of  these  bounds,  still  easily  obtained  from  the 
decomposition  lemma. 

Corollary  2.7.  The  smoothing  parameter  of  the  tensor  product  of  any  two  lattices  Ai,  A2  satisfies  i)(A  1  C 
A2)  <  bl{ Ai)  •  rj{A2). 

Proof  Let  B  =  [bi, . . . ,  b/,:]  be  a  basis  of  Ai  achieving  max,  j b,  |  =  W(A  1 ),  and  consider  the  natural 
decomposition  of  Ai  <g>  A2  into  the  sum 

(bi  0  A2)  H - h  (bfc  0  A2). 

Notice  that  the  projection  of  each  sublattice  b,  0  A2  orthogonal  to  the  previous  sublattices  b?  0  A2  (for 
j  <  i)  is  precisely  bt  0  A2,  and  has  smoothing  parameter  77(h) *  0  A2)  =  ||bj||  •  ?7(A2).  Therefore,  by  repeated 
application  of  Lemma  2.6,  we  have  p{A\  0  A2)  <  max*  ||bj||  •  ?7(A2)  =  bl(A\)  ■  r]{A2).  □ 
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The  following  proposition  relates  the  problem  of  sampling  lattice  vectors  according  to  a  Gaussian 
distribution  to  the  SIVP. 

Proposition  2.8  ([24],  Lemma  3.17).  There  is  a  polynomial  time  algorithm  that,  given  a  basis  for  an  n- 
dimensional  lattice  A  and  polynomially  many  samples  from  D \  a  for  some  a  >  2r/(A),  solves  SIVP7  on 
input  lattice  A  (in  the  worst  case  over  A,  and  with  overwhelming  probability  over  the  choice  of  the  lattice 
samples)  for  approximation  factor  7  =  Oyfn-  con. 

2.4  The  SIS  and  LWE  Functions 

In  this  paper  we  are  interested  in  two  special  families  of  functions,  which  are  the  fundamental  building  blocks 
of  lattice  cryptography.  Both  families  are  parametrized  by  three  integers  m,  n  and  q,  and  a  set  X  C  Zm  of 
short  vectors.  Usually  n  serves  as  a  security  parameter  and  m  and  q  arc  functions  of  n. 

The  Short  Integer  Solution  function  family  SIS(m,  n,  q ,  X)  is  the  set  of  all  functions  /a  indexed  by 
A  G  Z”xm  with  domain  X  C  Zm  and  range  Y  =  Z”,  defined  as  /a(x)  =  Ax  mod  q.  The  Learning 
With  Errors  function  family  LWE(m,  n,  q,  X)  is  the  set  of  all  functions  gx  indexed  by  A  G  Z”xm  with 
domain  Z”  x  X  and  range  Y  =  Z"\  defined  as  gx{&,  x)  =  ATs  +  x  mod  q.  Both  function  families  are 
endowed  with  the  uniform  distribution  over  A  6  Z”xm.  We  omit  the  set  X  from  the  notation  SIS(m,  n,  q) 
and  LWE(m,  n,  q)  when  clear  from  the  context,  or  unimportant. 

In  the  context  of  collision  resistance,  we  sometimes  write  SIS(m,  n.  q.  (3)  for  some  real  ft  >  0,  without 
an  explicit  domain  X.  Here  the  collision-finding  problem  is,  given  A  G  Z”xm,  to  find  distinct  x,  x'  <8  Zm 
such  that  || x  —  x;||  <  ft  and  /a(x)  =  /a(x').  It  is  easy  to  see  that  this  is  equivalent  to  finding  a  nonzero 
z  G  Zm  of  length  at  most  ||z||  <  (3  such  that  /a(z)  =  0. 

For  other  security  properties  (e.g.,  one-wayness,  uninvertibility,  etc.),  the  most  commonly  used  classes  of 
domains  and  input  distributions  X  for  SIS  are  the  uniform  distribution  U( X)  over  the  set  A  =  { 0 , . . . ,  *  —  I  }rn 
or  X  =  {— s, . . . ,  0, . . . ,  s}m,  and  the  discrete  Gaussian  distribution  .  Usually,  this  distribution  is 
restricted  to  the  set  of  short  vectors  X  =  {x  G  Zm :  ||x||  <  sfm},  which  carries  all  but  a  2^  -ltrn>  fraction 
of  the  probability  mass  of  Ofs. 

For  the  LWE  function  family,  the  input  is  usually  chosen  according  to  distribution  U( Z")  x  X,  where  X 
is  one  of  the  SIS  input  distributions.  This  makes  the  SIS  and  LWE  function  families  essentially  equivalent, 
as  shown  in  the  following  proposition. 

Proposition  2.9  ([15, 17]).  For  any  n,  m  >  n  +  w(log  n),  q,  and  distribution  X  over  Zm,  the  LWE(m.  n,  q) 
function  family  is  one-way  (resp.  pseudorandom,  or  uninvertible)  with  respect  to  input  distribution  U(  Z™ )  X  X 
if  and  only  if  the  SIS(m,  m  —  n,  q)  function  family  is  one-way  (resp.  pseudorandom,  or  uninvertible)  with 
respect  to  the  input  distribution  X. 

In  applications,  the  SIS  function  family  is  typically  used  with  larger  input  domains  X  for  which  the 
functions  are  surjective  but  not  injective,  while  the  LWE  function  family  is  used  with  smaller  domains  X  for 
which  the  functions  arc  injective,  but  not  surjective.  The  results  in  this  paper  arc  more  naturally  stated  using 
the  SIS  function  family,  so  we  will  use  the  SIS  formulation  to  establish  our  main  results,  and  then  reformulate 
them  in  terms  of  the  LWE  function  family  by  invoking  Proposition  2.9.  We  also  use  Proposition  2.9  to 
reformulate  known  hardness  results  (from  worst-case  complexity  assumptions)  for  LWE  in  terms  of  SIS. 

Assuming  the  quantum  worst-case  hardness  of  standard  lattice  problems,  Regev  [24]  showed  that  the 
LWE(m,  n,  q)  function  family  is  hard  to  invert  with  respect  to  the  discrete  Gaussian  error  distribution 
for  any  a  >  2f  n.  (See  also  [21]  for  a  classical  reduction  that  requires  q  to  be  exponentially  large  in  n. 
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Because  we  are  concerned  with  small  parameters  in  this  work,  we  focus  mainly  on  the  implications  of  the 
quantum  reduction.) 

Proposition  2.10  ([24],  Theorem  3.1).  For  any  m  =  n(>(  1  (,  integer  q  and  real  a  G  (0, 1)  such  that  aq  > 
2 \Jn,  there  is  a  polynomial  time  quantum  reduction  from  sampling  D\:(J  (for  any  n-dimensional  lattice  A 
and  a  >  (y/2n/a)r](A))  to  inverting  the  LWE(m,  n,  q)  function  family  on  input  y  =  Dimaq. 

Combining  Propositions  2.8,  2.9  and  2.10,  we  get  the  following  corollary. 

Corollary  2.11.  For  any  positive  m,n  such  that  a? (log  n)  <  m  —  n  <  n()(  1 1  and  2y/n  <  a  <  q,  the 
SIS(m,  m  —  n,  q)  function  family  is  uninvertible  with  respect  to  input  distribution  D™a,  under  the  assumption 
that  no  ( quantum )  algorithm  can  efficiently  sample  from  a  distribution  statistically  close  to  DA  Ja¬ 
in  particular,  assuming  the  worst-case  (quantum)  hardness  ofSTVFnuJnq/a  over  n-dimensional  lattices, 
the  SIS(m,  m  —  n,  q)  function  family  is  uninvertible  with  respect  to  input  distribution  Dffa. 

We  use  the  fact  that  LWE/SIS  is  not  only  hard  to  invert,  but  also  pseudorandom.  This  is  proved  using 
search-to-decision  reductions  for  those  problems.  The  most  general  such  reductions  known  to  date  arc  given 
in  the  following  two  theorems. 

Theorem  2.12  ([17]).  For  any  positive  m,  n  such  that  cc(log  n)  <  m  —  n  <  n °(l\  any  positive  cr  <  n°^\ 
and  any  q  with  no  divisors  in  the  interval  ((a / un)m^k ,  a  ■  un),  ifSIS(m,  m  —  n,  q,  D™a)  is  uninvertible, 
then  it  is  also  pseudorandom. 

Notice  that  when  a  >  \  the  interval  {(a /ujn)m/k ,  cr  ■  ojn )  is  empty,  and  Theorem  2.12  holds 

without  any  restriction  on  the  factorization  of  the  modulus  q. 

Theorem  2.13  ([18]).  Let  q  have  prime  factorization  q  =  pef  ■  ■  ■  pekk  for  pairwise  distinct  poly  (n)-bounded 
primes  pi  with  each  et  >  1,  and  let  0  <  a  <  1/c cn.  If  LWE(m,n,  q,  D™aq)  is  hard  to  invert  for  all 
m(n )  =  n °<'1\  then  LWE(m/,  n,  q,  D™afq)  is  pseudorandom  for  any  m!  =  n0^  and 

a'  >  max{a,aj^+1/e  ■  a1/e ,un/pef , . .  .,utn/pekk}, 

where  t  is  an  upper  bound  on  number  of  prime  factors  pi  <  0Jn/ a'. 

In  this  work  we  focus  on  the  use  of  Theorem  2.12,  because  it  guarantees  pseudorandomness  for  the  same 
value  of  m  as  for  the  assumed  one-wayness.  This  feature  is  important  for  applying  our  results  from  Section  4, 
which  guarantee  one-wayness  for  particular  values  of  m  (but  not  necessarily  all  m  =  n°^). 

Corollary  2.14.  For  any  positive  m,  n,  a,  q  such  that  cc(log  n)  <  m—n  <  n0(-1^  and  2  y/n  <  a  <  q  <  n°(1\ 
if  q  has  no  divisors  in  the  range  ((a/un)1+n^k,  a  ■  i cn),  then  the  SIS (m,m  —  n,q)  function  family  is 
pseudorandom  with  respect  to  input  distribution  Dffa,  under  the  assumption  that  no  ( quantum )  algorithm 
can  efficiently  sample  (up  to  negligible  statistical  errors)  DA  ^2nq/a- 

In  particular,  assuming  the  worst-case  (quantum)  hardness  of  SIVPna,rig/(T  on  n-dimensional  lattices,  the 
SIS(m,  m  —  n ,  q)  function  family  is  pseudorandom  with  respect  to  input  distribution  DVffa. 
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3  Hardness  of  SIS  with  Small  Modulus 


We  first  prove  a  simple  “success  amplification”  lemma  for  collision-finding  in  SIS,  which  says  that  any 
inverse-polynomial  advantage  can  be  amplified  to  essentially  1,  at  only  the  expense  of  a  larger  runtime  and 
value  of  m  (which  will  have  no  ill  effects  on  our  final  results).  Therefore,  for  the  remainder  of  this  section  we 
implicitly  restrict  our  attention  to  collision-finding  algorithms  that  have  overwhelming  advantage. 

Lemma  3.1.  For  arbitrary  n,  q,  m  and  X  C  Zm,  suppose  there  exists  a  probabilistic  algorithm  A  that  has 
advantage  e  >  0  in  collision-finding  for  SIS(m,  n,  q,  X).  Then  there  exists  a  probabilistic  algorithm  B  that 
has  advantage  1  —  (1  —  e)*  >  1  —  exp  (— ef )  =  1  —  exp(— n)  in  collision-finding  for  SIS  (M  =  t-m,n,q,  X'), 
where  t  =  n/e  and  X'  =  (J*=1({0m}l_1  X  X  X  {0m}<_l).  The  runtime  ofB  is  essentially  t  times  that  of  A. 

Proof  The  algorithm  B  simply  partitions  its  input  A  £  Z”xM  into  blocks  A,;  £  Z™xm  an(j  invokes  y[  (wjth 
fresh  random  coins)  on  each  of  them,  until  A  returns  a  valid  collision  x,  x7  £  X  for  some  A,.  Then  B  returns 

(0m(i"1)1x10m(t“<)),(0m(i-1U',0ra(f"i))  £  X' 

as  a  collision  for  A.  Clearly,  B  succeeds  if  any  call  to  A  succeeds.  Since  all  t  calls  to  A  are  on  independent 
inputs  A i  and  use  independent  coins,  some  call  will  succeed,  except  with  (1  —  e:)/  probability.  □ 

3.1  SIS-to-SIS  Reduction 

Our  first  proof  that  the  SIS(m,  n.  q,  (3)  function  family  is  collision  resistant  for  moduli  q  as  small  as  nV2+<5 
proceeds  by  a  reduction  between  SIS  problems  with  different  parameters.  Previous  hardness  results  based  on 
worst-case  lattice  assumptions  require  the  modulus  q  to  be  at  least  ft  ■  iM  fn  log  n)  [12,  Theorem  9.2],  and 
/3  >  y/n  log  q  is  needed  to  guarantee  that  a  nontrivial  solution  exists.  For  such  parameters,  SIS  is  collision 
resistant  assuming  the  hardness  of  approximating  worst-case  lattice  problems  to  within  ~  f3y/n  factors. 

The  intuition  behind  our  proof  for  smaller  moduli  is  easily  explained.  We  reduce  SIS  with  modulus  qc 
and  solution  bound  (3C  (for  any  constant  integer  c  >  1)  to  SIS  with  modulus  q  and  bound  3.  Then  as  long  as 
(q/  3)c  >  Ld(\/n  log  n).  the  former  problem  enjoys  worst-case  hardness,  hence  so  does  the  latter.  Thus  we  can 
take  q  =  3  ■  rf  for  any  constant  5  >  0,  and  c  >  1/(2 5).  Notice,  however,  that  the  underlying  approximation 
factor  for  worst-case  lattice  problems  is  ~  (3cy/n  >  n  ]  1  which,  while  still  polynomial,  degrades 

severely  as  5  approaches  0.  In  the  next  subsection  we  give  a  direct  reduction  from  worst-case  lattice  problems 
to  SIS  with  a  small  modulus,  which  does  not  have  this  drawback. 

The  above  discussion  is  formalized  in  the  following  proposition.  For  technical  reasons,  we  prove  that 
SIS(m,  n,  q,  X)  is  collision  resistant  assuming  that  the  domain  X  has  the  property  that  all  SIS  solutions 
z  £  (X  —  X)  \  {0}  satisfy  gcd(z,  q)  =  1.  This  restriction  is  satisfied  in  many  (but  not  all)  common  settings, 
e.g.,  when  q  >  (3  is  prime,  or  when  X  C  {0,  l}m  is  a  set  of  binary  vectors. 

Proposition  3.2.  Let  n,  q,  m,  (3  and  X  C  Zm  be  such  that  gcd(x  —  xk  q)  =  1  and  ||x  —  x' |  <  (3  for  any 
distinct  x,  A  £  X.  For  any  positive  integer  c,  there  is  a  deterministic  reduction  from  collision-finding  for 
SIS(mc,  n,  qc,  3C)  to  collision-finding  for  SIS(m,  n,  q,  X)  (in  both  cases,  with  overwhelming  advantage ). 
The  reduction  runs  in  time  polynomial  in  its  input  size,  and  makes  fewer  than  mc  calls  to  its  oracle. 

Proof  Let  A  be  an  efficient  algorithm  that  finds  a  collision  for  SIS(m,  n,q,X)  with  overwhelming  advantage. 
We  use  it  to  find  a  nonzero  solution  for  SIS (//;/’,  n,  qc,  (3C).  Let  A  £  Z/,XT"  be  an  input  SIS  instance.  Partition 
the  columns  of  A  into  mc_1  blocks  A,  £  Z™cxm,  and  for  each  one,  invoke  A  to  find  a  collision  modulo  q, 
i.e.,  a  pair  of  distinct  vectors  Xj,  x'  £  X  such  that  A,z,;  =  0  mod  q,  where  Zj  =  x,  —  x-  and  ||Zi||  <P- 
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For  each  i,  since  gcd(z.j,  q)  =  1  and  A,z,  =  0  mod  q,  the  vector  a'  =  ( A,  z,)/q  G  Z” ._ ,  is  uniformly 

c _ \ 

random,  even  after  conditioning  on  z,  and  A;  mod  q.  So,  the  matrix  A'  G  Z”X_T  made  up  of  all  these 

columns  is  uniformly  random.  By  induction  on  c,  using  A  we  can  find  a  nonzero  solution  z'  G  ZmC  1  such 
that  A'z'  =  0  mod  qc~  1  and  1 1  z'  1 1  <  /3C-1.  Then  it  is  easy  to  verify  that  a  nonzero  solution  for  the  original 
instance  A  is  given  by  z  =  (z[  ■  z\, . . .  ,z'  c_ i  •  zmc-i )  G  Zm<?,  and  that  ||z|j  <  1 1 z' 1 1  •  max*  ||zj||  <  (3C . 
Finally,  the  total  number  of  calls  to  A  is  Yl°i= o  ni>  <  m°’  as  claimed.  □ 

3.2  Direct  Reduction 

As  mentioned  above,  the  large  worst-case  approximation  factor  associated  with  the  use  of  Proposition  3.2  is 
undesirable,  as  is  (to  a  lesser  extent)  the  restriction  that  gcd(X  —  X,q)  =  1.  To  eliminate  these  drawbacks,  we 
next  give  a  direct  proof  that  SIS  is  collision  resistant  for  small  q,  based  on  the  assumed  hardness  of  worst-case 
lattice  problems.  The  underlying  approximation  factor  for  these  problems  can  be  as  small  as  0((3^/n),  which 
matches  the  best  known  factors  obtained  by  previous  proofs  (which  require  a  larger  modulus  q).  Our  new 
proof  combines  ideas  from  [19,  12]  and  Proposition  3.2,  as  well  as  a  new  convolution  theorem  for  discrete 
Gaussians  which  strengthens  similar  ones  previously  proved  in  [22,  6]. 

Our  proof  of  the  convolution  theorem  is  substantially  different  and,  we  believe,  technically  simpler  than 
the  prior  ones.  In  particular,  it  handles  the  sum  of  many  Gaussian  samples  all  at  once,  whereas  previous  proofs 
used  induction  from  a  base  case  of  two  samples.  With  the  inductive  approach,  it  is  technically  complex  to 
verify  that  all  the  intermediate  Gaussian  parameters  (which  involve  harmonic  means)  satisfy  the  hypotheses. 
Moreover,  the  intermediate  parameters  can  depend  on  the  order  in  which  the  samples  are  added  in  the 
induction,  leading  to  unnecessarily  strong  hypotheses  on  the  original  parameters. 

Theorem  3.3.  Let  A  be  an  n-dimensional  lattice,  z  G  Zm  a  nonzero  integer  vector,  Si  >  -v/2||z||oo  ■  r/( A), 
and  A  +  c,;  arbitrary  cosets  of  A  for  i  =  1, . . . ,  m.  Let  yi  be  independent  vectors  with  distributions  D\+Ci)Si, 
respectively.  Then  the  distribution  of  y  =  z1:yt  is  statistically  close  to  DytS,  where  Y  =  gcd(z)A  +  c, 
c  =  zici>  and  s  =  VYli(zisi)2- 

In  particular,  if  gcd(z)  =  1  and  ZjCj  G  A,  then  y  is  distributed  statistically  close  to  D\jS. 

Proof  First  we  verify  that  the  support  of  y  is 


y  Zi( A  +  d)  =  y  ztA  +  y  •  ct  =  gcd(z)A  +  y  •  Cj  =  Y. 

iii  i 

So  it  remains  to  prove  that  each  y  G  Y  has  probability  (nearly)  proportional  to  ps( y). 

For  the  remainder  of  the  proof  we  use  the  following  convenient  scaling.  Define  the  diagonal  matrices 
S  =  diag(si, . . . ,  sm)  and  S'  =  S  <8>  In.  and  the  ?n?r-dimensional  lattice  A'  =  ©,(s~1A)  =  (S©1  •  A®m, 
where  0  denotes  the  (external)  direct  sum  of  lattices  and  A®m  =  Zm  <g)  A  is  the  direct  sum  of  rn  copies 
of  A.  Then  by  independence  of  the  y j,  it  can  be  seen  that  y'  =  (S')-1  •  (yi, . . . ,  ym)  has  discrete  Gaussian 
distribution  D a'+c'  (with  parameter  1),  where  c'  =  (S')-1  ■  (ci, . . . ,  cm). 

The  output  vector  y  =  zryt  can  be  expressed,  using  the  mixed-product  property  for  Kronecker 
products,  as 

y  =  (zT  <g>  In)  •  (yi, . . . ,  ym)  =  (z1  <g>  I„)  •  S'  •  y'  =  ((z7  S)  <8>  In)  •  y'- 

So,  letting  Z  =  ((z7  S)  <8>  In),  we  want  to  prove  that  the  distribution  of  y  ~  Z  •  D\i+ci  is  statistically  close 
to  Dy,s. 
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Fix  any  vectors  x'eA'l  c'  and  y  =  Zx'  G  Y,  and  define  the  proper  sublattice 

i  =  {v£A':Zv  =  0}=A'n  ker(Z)  C  A'. 


It  is  immediate  to  verify  that  the  set  of  all  y'  6  A'  +  c'  such  that  Zy'  =  y  is  (A'  +  c')  n  ker(Z)  =  L  +  x'. 
Let  x  be  orthogonal  projection  of  x'  onto  ker(Z)  D  L.  Then  we  have 


Pr[y  =  y] 


p{L  +  x') 
p(  A'  +  c') 


p(x 


.  p{L  +  yi) 

p{  A'  +  c')  ‘ 


Below  we  show  that  r/(L)  <  1,  which  implies  that  p(L  +  x)  is  essentially  the  same  for  all  values  of  x',  and 
hence  for  all  y.  Therefore,  we  just  need  to  analyze  p(x'  —  x). 

Since  Z7  is  an  orthogonal  basis  for  ker(Z)2-,  each  of  whose  columns  has  Euclidean  norm  s  = 
we  have  x'-x  =  (ZT  Zx')/s2,  and 

||x  -  x||2  =  (x',  ZTZx')/s2  =  1 1  Zx7 1 1 2/ s2  =  (||y||/s)2. 


Therefore,  p(x'  —  x)  =  ps( y),  and  so  Pr[y  =  y]  is  essentially  proportional  to  ps( y),  i.e.,  the  statistical 
distance  between  y  and  Dy,s  is  negligible. 

It  remains  to  bound  the  smoothing  parameter  of  L.  Consider  the  m-dimensional  integer  lattice  Z  = 
Zm  n  ker(zr)  =  {v  G  Zm  :  (z,  v)  =  0}.  Because  (Z  (g)  A)  C  (Zm  (g)  A)  and  S_1Z  C  ker(zTS),  it  is 
straightforward  to  verify  from  the  definitions  that 

(S')-1  •  (Z®  A)  =  ((S-1Z)  ®  A) 


is  a  sublattice  of  L.  It  follows  from  Corollary  2.7  and  by  scaling  that 

rj(L)  <  7l({S'y1  ■  ( Z  <g>  A))  <  7](A)  ■  bl{Z)/ mins*. 


Finally,  bl(Z)  <  min{  ||z||,  v/2||z||00}  because  Z  has  a  full-rank  set  of  vectors  zt  ■  c?  —  Zj  ■  e*,  where  index 
i  minimizes  [ z,;  /  0,  and  j  ranges  over  {1, . . . ,  m}  \  {i } .  By  assumption  on  the  Si,  we  have  r/(L)  <  1  as 
desired,  and  the  proof  is  complete.  □ 


Remark  3.4.  Although  we  will  not  need  it  in  this  work,  we  note  that  the  statement  and  proof  of  Theorem  3.3 
can  be  adapted  to  the  case  where  the  yz  respectively  have  non-spherical  discrete  Gaussian  distributions 
Da,+c,  v/yv  with  positive  definite  “covariance”  parameters  E,.  <E  Mnxn,  over  cosets  of  possibly  different 
lattices  A  j.  (See  [22]  for  a  formal  definition  of  these  distributions.) 

In  this  setting,  by  scaling  A*  and  E,  we  can  assume  without  loss  of  generality  that  z  =  (1,1,...,  1). 
The  theorem  statement  says  that  y’s  distribution  is  close  to  a  discrete  Gaussian  (over  an  appropriate  lattice 
coset)  with  covariance  parameter  E  =  ^  Z?,  under  mild  assumptions  on  In  the  proof  we  simply 

let  S'  be  the  block-diagonal  matrix  with  the  as  its  diagonal  blocks,  let  A'  =  (S')-1  •  0  ■  A  j,  and  let 
Z  =  (z7  g)  In)  •  S'  =  [v/Zi  |  •  •  •  |  \/Sm].  Then  the  only  technical  difference  is  in  bounding  the  smoothing 
parameter  of  L. 


The  convolution  theorem  implies  the  following  simple  but  useful  lemma,  which  shows  how  to  convert 
samples  having  a  broad  range  of  parameters  into  ones  having  parameters  in  a  desired  narrow  range. 

Lemma  3.5.  There  is  an  efficient  algorithm  which,  given  a  basis  B  of  some  lattice  A,  some  R  >  \/2  and 
samples  (y,.,  sr)  where  each  Si  G  [y/2 ,R]  ■  //(A)  and  each  y,  has  distribution  lf\.s,,  with  overw’helming 
probability  outputs  a  sample  (y,  s)  where  s  G  [R,  V2R]  ■  fi(A)  and  y  has  distribution  statistically  close 
to  Da,s. 
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Proof.  Let  ujn  =  u(y/log  n)  satisfy  ujn  <  \fn.  The  algorithm  draws  2 n2  input  samples,  and  works  as 
follows:  if  at  least  n2  of  the  samples  have  parameters  Si  <  R  ■  i/A)  /{\Jn  ■  un),  then  with  overwhelming 
probability  they  all  have  lengths  bounded  by  R  •  rj(A)/ujn  and  they  include  n  linearly  independent  vectors. 
Using  such  vectors  we  can  construct  a  basis  S  such  that  ||  S  ||  <  R  ■  77  ( A)/un,  and  with  the  sampling  algorithm 
of  [12,  Theorem  4.1]  we  can  generate  samples  having  parameter  R  ■  77(A). 

Otherwise,  at  least  n2  of  the  samples  (yz.  sf)  have  parameters  st  >  max{  R/n,  \/2 }  •  77(A).  Then  by 
summing  an  appropriate  subset  of  those  y *,  by  the  convolution  theorem  we  can  obtain  a  sample  having 
parameter  in  the  desired  range.  □ 

The  next  lemma  is  the  heart  of  our  reduction.  The  novel  part,  corresponding  to  the  properties  described  in 
the  second  item,  is  a  way  of  using  a  collision-finding  oracle  to  reduce  the  Gaussian  width  of  samples  drawn 
from  a  lattice.  The  first  item  corresponds  to  the  guarantees  provided  by  previous  reductions. 

Lemma  3.6.  Let  m,  n  be  integers,  S  =  {z  G  Zm  \  {0}  |  ||z||  <  P  A  1 1 z 1 1 qq  <  Poo}  for  some  real  P  >  P  >  0, 
and  q  an  integer  modulus  with  at  most  poly  (77)  integer  divisors  less  than  Poo .  There  is  a  probabilistic 
polynomial  time  reduction  that,  on  input  any  basis  B  of  a  lattice  A  and  sufficiently  many  samples  ( y, ,  sf) 
where  >  s/2 q  ■  77(A)  and  y,  has  distribution  D\  s.,  and  given  access  to  an  SIS(ra,  n.  q,  S )  oracle  ( that 
finds  collisions  z  G  S  with  nonnegligible  probability )  outputs  (with  overwhelming  probability)  a  sample 
(y,  s)  with  min  s *  / q  <  s  <  (P/q)  ■  max  st,  and  y  G  A  such  that: 

•  E[||y||]  <  ( Py/n/q )  •  max  Sj,  and  for  any  subspace  H  C  Mn  of  dimension  at  most  77—1,  with 
probability  at  least  1/10  we  have  y  /  //. 

•  Moreover,  if  each  Si  >  s/2Pooq  ■  77(A),  then  the  distribution  of  y  is  statistically  close  to  D\)S 

Proof  Let  A  be  the  collision-finding  oracle.  Without  loss  of  generality,  we  can  assume  that  whenever  A 
outputs  a  valid  collision  z  6  S,  we  have  that  gcd(z)  divides  q.  This  is  so  because  for  any  integer  vector 
z,  if  Az  =  0  mod  q  then  also  A((g/d)z)  =  0  mod  q,  where  d  =  gcd(z)  and  g  =  gcd(d,  q).  Moreover, 
(g/d)z  G  S  holds  true  and  gcd((p/d)z)  =  gcd(z,  q)  divides  q.  Let  d  be  such  that  A  outputs,  with  non¬ 
negligible  probability,  a  valid  collision  z  satisfying  gcd(z)  =  d.  Such  a  d  exists  because  gcd(z)  is  bounded 
by  Poo  and  divides  q,  so  by  assumption  there  are  only  polynomially  many  possible  values  of  d.  Let  q'  =  q/d, 
which  is  an  integer.  By  increasing  777.  and  using  standard  amplification  techniques,  we  can  make  the  probability 
that  A  outputs  such  a  collision  (satisfying  z  €  S,  Az  =  0  (mod  q)  and  gcd(z)  =  d)  exponentially  close 
to  1. 

Let  (y i,  s/  for  i  =  1, . . . ,  777  be  input  samples,  where  yz  has  distribution  D\S(.  Write  each  yz  as 
y i  =  Ba,  mod  q'A  for  a,  <G  Z/z .  Since  st  >  q' 77(A)  the  distribution  of  a,;  is  statistically  close  to  uniform  over 
Z”, .  Let  A  =  [ai  |  •  •  •  |  am]  G  Z/Xm,  and  choose  A'  G  Z/  X'm  uniformly  at  random.  Since  A  is  statistically 
close  to  uniform  over  Z'/Xrn,  the  matrix  A  +  q'A'  is  statistically  close  to  uniform  over  Z”xm.  Call  the  oracle 
A  on  input  A+q'A',  and  obtain  (with  overwhelming  probability)  a  nonzero  z  G  S  with  gcd(z)  =  d,  ||z||  <  P, 
1 1 z  1 1 00  <  Poo  and  (A  +  q'A')z  =  0  mod  q.  Notice  that  q'A'z  =  qA'(z/d)  =  0  mod  q  because  (z / d)  is  an 
integer  vector.  Therefore  Az  =  0  mod  q.  Finally,  the  reduction  outputs  (y,  s),  where  y  =  ffr  z^y  -J q  and 
s  =  VYli(sizi)2 / Q-  Notice  that  zpyi  G  qA  +  B(zja?;)  because  gcd(z)  =  d,  so  y  G  A. 

Notice  that  s  satisfies  the  stated  bounds  because  z  is  a  nonzero  integer  vector.  We  next  analyze  the 
distribution  of  y.  For  any  fixed  a*,  the  conditional  distribution  of  each  yz  is  Dqi\+^auSi,  where  Si  > 
\/2i](q' A).  The  claim  on  E[||y  ||]  then  follows  from  [19,  Lemma  2.1 1  and  Lemma  4.3]  and  Flolder’s  inequality. 
The  claim  on  the  probability  that  y  f  If  was  initially  shown  in  the  preliminary  version  of  [19];  see  also  [24, 
Lemma  3.15]. 


16 


10.  Hardness  of  SIS  and  LWE  with  Small  Parameters 


Now  assume  that  s*  >  \plfi00q  •  77(A)  >  \/2|jz||  x  •  p(q'A )  for  all  i.  By  Theorem  3.3  the  distribution  of 
y  is  statistically  close  to  Dy/qs  where  Y  =  gcd(z)  •  q' A  +  B(Az).  Using  Az  =  0  mod  q  and  gcd(z)  =  d, 
we  get  Y  =  qA.  Therefore  y  has  distribution  statistically  close  to  DjytS,  as  claimed.  □ 

Building  on  Lemma  3.6,  our  next  lemma  shows  that  for  any  q>  j3  ■  np( 1  \  a  collision-finding  oracle  can 
be  used  to  obtain  Gaussian  samples  of  width  close  to  2 33oo  •  77(A). 

Lemma  3.7.  Let  m,  n,  q,  S  as  in  Lemma  3.6,  and  also  assume  q/ 3  >  r3  for  some  constant  5  >  0.  There  is 
an  efficient  reduction  that,  on  input  any  basis  B  of  an  n-dimensional  lattice  A,  an  upper  bound  p  >  77(A),  and 
given  access  to  an  SIS(m,  n,  q,  S)  oracle  (finding  collisions  z  G  S  with  nonnegligible  probability),  outputs 
(with  overwhelming  probability )  a  sample  (y,  s )  where  v/2/300  ■  p  <  s  <  ^3oo3  '  P  and  y  has  distribution 
statistically  close  to  L);\  s. 

Proof.  By  applying  the  LLL  basis  reduction  algorithm  [13]  to  the  basis  B,  we  can  assume  without  loss 
of  generality  that  ||B||  <  2n  •  77(A).  Let  ojn  be  an  arbitrary  function  in  n  satisfying  oon  =  co(y/log  71)  and 
utn  <  y/n/2. 

The  main  procedure,  described  below,  produces  samples  having  parameters  in  the  range  [1,  q\  ■  V%3oo  •  P- 
On  these  samples  we  run  the  procedure  from  Lemma  3.5  (with  R  =  \p23^q  ■  p)  to  obtain  samples  having 
parameters  in  the  range  [y/2,  2]  •  3oc(J  ■  P-  Finally,  we  invoke  the  reduction  from  Lemma  3.6  on  those  samples 
to  obtain  a  sample  satisfying  the  conditions  in  the  Lemma  statement. 

The  main  procedure  works  in  a  sequence  of  phases  i  =  0, 1,  2, . . ..  In  phase  i,  the  input  is  a  basis  B, 
of  A,  where  initially  Bo  =  B.  The  basis  B,  is  used  in  the  discrete  Gaussian  sampling  algorithm  of  [12, 
Theorem  4.1]  to  produce  samples  (y,  sf),  where  sl  =  max{  IB,  |  •  ujn,  \/23ocrl}  A  V^3<xP  and  y,  has 
distribution  statistically  close  to  D\>Si.  Phase  i  either  manages  to  produce  a  sample  (y,  s)  with  s  in  the 
desired  range  [1,  <7]  •  V^3ooP,  or  it  produces  a  new  basis  B;+i  for  which  ||B;+i||  <  ||B;  ||/2,  which  is  the 
input  to  the  next  phase.  The  number  of  phases  before  termination  is  clearly  polynomial  in  n,  by  hypothesis 
on  B. 

If  ||B,;|| -cjn  <  \f2q3oop,  then  this  already  gives  samples  with  Si  €  [1,  q]  \/23ooP  in  the  desired  range,  and 
we  can  terminate  the  main  phase.  So,  we  may  assume  that  -s,  =  |B,  ||  -ujn  >  \/2q3^p-  Each  phase  i  proceeds 
in  some  constant  c  >  1/5  number  of  sub-phases  j  =  1,2 , ...  ,c,  where  the  inputs  to  the  first  sub-phase 
are  the  samples  (y,  Sj)  generated  as  described  above.  We  recall  that  these  samples  satisfy  Si  >  V2q3oc.p. 
The  same  will  be  true  for  the  samples  passed  as  input  to  all  other  subsequent  subphases.  So,  each  subphase 
receives  as  input  samples  (y,  s)  satisfying  all  the  hypotheses  of  Lemma  3.6,  and  we  can  run  the  reduction 
from  that  lemma  to  generate  new  samples  (y',  s')  having  parameters  s'  bounded  from  above  by  st  ■  (3/q)3 
and  from  below  by  \f23ocp.  If  any  of  the  produces  samples  satisfies  s'  <  qs/2/3ooT  then  we  can  terminate 
the  main  procedure  with  (y7,  s7)  as  output.  Otherwise,  all  samples  produced  during  the  subphase  satisfy 
s7  >  qs/23  ooP,  and  they  can  be  passed  as  input  to  the  next  sub-phase.  Notice  that  the  total  runtime  of  all 
the  sub-phases  is  poly(n)c,  because  each  invocation  of  the  reduction  from  Lemma  3.6  relies  on  poly(ra) 
invocations  of  the  reduction  in  the  previous  sub-phase;  this  is  why  we  need  to  limit  the  number  of  sub-phases 
to  a  constant  c. 

If  phase  i  ends  up  running  all  its  sub-phases  without  ever  Ending  a  sample  with  s'  G  [1,  q\ \f2fi ^p,  then  it 
has  produced  samples  whose  parameters  arc  bounded  by  (3 /q)c  <  Si  <  Sj/  fiYi.  It  uses  n2  of  these  samples, 
which  with  overwhelming  probability  have  lengths  all  bounded  by  sy  / yfn,  and  include  n  linearly  independent 
vectors.  It  transforms  those  vectors  into  abasis  Bi+i  with  ||Bj+i||  <  Si/y/n  <  WB^ujn/ y/n  <  ||B,;||/2,  as 
input  to  the  next  phase.  □ 

We  can  now  prove  our  main  theorem,  reducing  worst-case  lattice  problems  with  max{l,  33  00  M- 
0(3y/n)  approximation  factors  to  SIS,  when  q  >  3  •  rCu  1  h 
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Theorem  3.8.  Let  m,n  be  integers,  S  =  { z  £  Zm  \  {0}  |  ||z||  <  p  A  Hz^  <  Poo}  for  some  real 
P>P  oc  '''  0,  and  q  p  ‘  n  be  an  integer  modulus  with  at  most  poly ( ti  )  integer  divisors  less  than  Poo * 
For  some  7  =  max{l,  PPoo/q}  ■  O(P^fn),  there  is  an  efficient  reduction  from  SIVP^  ( and  hence  also  from 
standard  SIVPra,n )  on  n-dimensional  lattices  to  S -collision  finding  for  SIS(m,  n,  q)  with  non-negligible 
advantage. 

Proof  Given  an  input  basis  B  of  a  lattice  A,  we  can  apply  the  LLL  algorithm  to  obtain  a  2n-approximation 
to  7(A),  and  by  scaling  we  can  assume  that  77(A)  €  [1, 2n] .  For  i  =  1, . . , ,  n,  we  run  the  procedure  described 
below  for  each  hypothesized  upper  bound  77,,  =  2*  on  77(A).  Each  call  to  the  procedure  either  fails,  or  returns 
a  set  of  linearly  independent  vectors  in  A  whose  lengths  are  all  bounded  by  (7/2)  •  rp.  We  return  the  first 
such  obtained  set  (i.e.,  for  the  minimal  value  of  i).  As  we  show  below,  as  long  as  r/(  >  77(A)  the  procedure 
returns  a  set  of  vectors  with  overwhelming  probability.  Since  one  77*  £  [1,  2)  •  77(A),  our  reduction  solves 
SIYPf)  with  overwhelming  probability,  as  claimed. 

The  procedure  invokes  the  reduction  from  Lemma  3.7  with  77  =  77 7  to  obtain  samples  with  parameters 
in  the  range  [V2/?oo>  VP  P  Pop  ■  77.  On  these  samples  we  run  the  procedure  from  Lemma  3.5  with  R  = 
max{\/2(7,  pPpp^ }  to  obtain  samples  having  parameters  in  the  range  It,  s/2R]  ■  77.  On  such  samples  we 
repeatedly  run  (using  independent  samples  each  time)  the  reduction  from  Lemma  3.6.  After  enough  runs,  we 
obtain  with  overwhelming  probability  a  set  of  linearly  independent  lattice  vectors  all  having  lengths  at  most 
(7/2)  •  77,  as  required.  □ 

4  Hardness  of  LWE  with  Small  Uniform  Errors 

In  this  section  we  prove  the  hardness  of  inverting  the  LWE  function  even  when  the  error  vectors  have  very 
small  entries,  provided  the  number  of  samples  is  sufficiently  small.  We  proceed  similarly  to  [23,  4],  by  using 
the  LWE  assumption  (for  discrete  Gaussian  error)  to  construct  a  lossy  family  of  functions  with  respect  to 
a  uniform  distribution  over  small  inputs.  However,  the  parameterization  we  obtain  is  different  from  those 
in  [23,  4],  allowing  us  to  obtain  pseudorandomness  of  LWE  under  very  small  (e.g.,  binary)  inputs,  for  a 
number  of  LWE  samples  that  exceeds  the  LWE  dimension. 

Our  results  and  proofs  are  more  naturally  formulated  using  the  SIS  function  family.  So,  we  will  first 
study  the  problem  in  terms  of  SIS,  and  then  reformulate  the  results  in  terms  of  LWE  using  Proposition  2.9. 
We  recall  that  the  main  difference  between  this  section  and  Section  3,  is  that  here  we  consider  parameters 
for  which  the  resulting  functions  are  essentially  injective,  or  more  formally,  statistically  second-preimage 
resistant.  The  following  lemma  gives  sufficient  conditions  that  ensure  this  property. 

Lemma  4.1.  For  any  integers  m,  k,  q ,  s  and  set  X  C  [s]m,  the  function  family  SIS(m,  k,  q)  is  (statistically) 
e-second  preimage  resistant  with  respect  to  the  uniform  input  distribution  U(X)  for  e  =  \X\  •  ( s' /q)k , 
where  s'  is  the  largest  factor  of  q  smaller  than  s. 

Proof  Let  x  <—  U(X)  and  A  y-  SIS(tt7.,  k,  q)  be  chosen  at  random.  We  want  to  evaluate  the  probability 
that  there  exists  an  x7  £  X  \  {x}  such  that  Ax  =  Ax'  (mod  q),  or,  equivalently,  A(x  —  x7)  =  0 
(mod  q).  Lix  any  two  distinct  vectors  x,  x7  £  X  and  let  z  =  x  —  x7.  The  vector  Az  (mod  q)  is  distributed 
uniformly  at  random  in  ( ciZ/gZ)^ ,  where  d  =  ged (q,  7 .... .  zm).  All  coordinates  of  z  are  in  the  range 
Zi  £  {  —  (s  —  1), . . . ,  (s  —  1)},  and  at  least  one  of  them  is  nonzero.  Therefore,  d  is  at  most  s'  and  \dZk\  = 
( q/d)k  >  ( q/s')k .  By  union  bound  (over  x'  £  X  \  {x})  for  any  x,  the  probability  that  there  is  a  second 
preimage  x'  is  at  most  (|X|  —  l)(s' /q)k.  □ 
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We  remark  that,  as  shown  in  Section  3,  even  for  parameter  settings  that  do  not  fall  within  the  range 
specified  in  Lemma  4.1,  SIS(m,  k.  q )  is  collision  resistant,  and  therefore  also  (computationally)  second- 
preimage -resistant.  This  is  all  that  is  needed  in  the  rest  of  this  section.  However,  when  SIS(m,  k.  q)  is  not 
statistically  second-preimage  resistant,  the  one-wayness  proof  that  follows  (see  Theorem  4.5)  is  not  very 
interesting:  typically,  in  such  settings,  SIS(m,  k.  q)  is  also  statistically  uninvertible,  and  the  one-wayness 
of  SIS(m,  k,  q)  directly  follows  from  Lemma  2.2.  So,  below  we  focus  on  parameter  settings  covered  by 
Lemma  4. 1 . 

We  prove  the  one-wayness  of  T  =  SIS  (m,k,q,X)  with  respect  to  the  uniform  input  distribution 
X  =  U(X)  by  building  a  lossy  function  family  {C.  F .  X)  where  C  is  an  auxiliary  function  family  that  we 
will  prove  to  be  uninvertible  and  computationally  indistinguishable  from  T .  The  auxiliary  family  C  is  derived 
from  the  following  function  family. 

Definition  4.2.  For  any  probability  distribution  Y  over  TL~  and  integer  m  >  £,  let  T(m,£,y)  be  the  prob¬ 
ability  distribution  over  linear  functions  [I  |  Y] :  Zm  — >  where  I  is  the  (  X  (  identity  matrix,  and 
Y  e  is  obtained  choosing  each  column  of  Y  independently  at  random  from  y. 

We  anticipate  that  we  will  set  y  to  the  Gaussian  input  distribution  Y  =  D|  in  order  to  make  C 
indistinguishable  from  T  under  a  standard  LWE  assumption.  But  for  generality,  we  prove  some  of  our  results 
with  respect  to  a  generic  distribution  Y- 

The  following  lemma  shows  that  for  a  bounded  distribution  Y  (and  appropriate  parameters),  T(rn,  l,  y) 
is  (statistically)  uninvertible. 

Lemma  4.3.  Let  y  be  a  probability  distribution  on  [Y]  C  {—a, . . . ,  o}n,  and  let  X  C  {— s, . . . ,  s}m.  Then 
T(rn,  £,  Y)  is  e-uninvertible  with  respect  to  U{X)for  e  =  (1  +  2s(l  +  a(m  —  t))Y /\X\. 

Proof  Let  /  =  [I  j  Y]  be  an  arbitrary  function  in  the  support  of  I(m,  £.  Y)-  We  know  that  yl%j  \  <  a  for  all 
i,j.  We  first  bound  the  size  of  the  image  \  f{X)\.  By  the  triangle  inequality,  all  the  points  in  the  image  f(X') 
have  f'oc  norm  at  most 


||/(u)||00  <  ||u||oo(l  +  <r{m  -  £))  <  s(l  +  a(m  -  £)). 

The  number  of  integer  vectors  (in  7Ll:)  with  such  bounded  norm  is 

(1  +  2s(l  +  a (m  -  £)))£. 

Dividing  by  the  size  of  X  and  using  Lemma  2.4,  the  claim  follows.  □ 

Lemma  4.3  applies  to  any  distribution  Y  with  bounded  support.  When  Y  =  D|  is  a  discrete  Gaussian 
distribution,  a  slightly  better  bound  can  be  obtained.  (See  also  [4],  which  proves  a  similar  lemma  for  a 
different,  non-uniform  input  distribution  X.) 

Lemma  4.4.  Let  Y  =  be  the  discrete  Gaussian  distribution  with  parameter  a  >  0,  and  let  X  C 
{— s, . . . ,  s}m.  Then  l(m,  £,  Y)  is  e-uninvertible  with  respect  toU(X),  for  e  =  0(ams/s/i)i/\ X\  +2~n(m\ 

Proof  Again,  by  Lemma  2.4  it  is  enough  to  bound  the  expected  size  of  f(X)  when  /  <—  Uni,  £,  Y)  is 
chosen  at  random.  Remember  that  /  =  [I  j  Y]  where  Y  -t—  D^m  f  > .  Since  the  entries  of  Y  G 
arc  independent  mena-zero  subgaussians  with  parameter  a,  by  a  standard  bound  from  the  theory  of  random 
matrices,  the  largest  singular  value  si(Y)  =  maxo/xeKm  ||  Yx||/||x||  of  Y  is  at  most  a -Olp/I+s/m  —  £)  = 
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cr  •  0(y/m),  except  with  probability  2  We  now  bound  the  £2  norm  of  all  vectors  in  the  image  f(X). 

Let  u  =  (ui,  U2)  G  X,  with  ui  G  Z^  and  U2  G  Zm_£.  Then 


ll/(u)||  < 
< 

< 

< 


II  Ui  +  Yu2|| 

||ut||  +  ||Yu2|| 

(Vi  +  si(Y)\/ m  —  Yj  s 

(Vi  +  a  ■  0{y/m)y/m  —  £^  s 
0{crms). 


The  number  of  integer  points  in  the  ^-dimensional  zero-centered  ball  of  radius  R  =  0  (arris)  can  be  bounded 
by  a  simple  volume  argument,  as  |/(Y)|  <  (R  +  s/I/ 2)nV^  =  0(ams/Viy,  where  Vn  =  7r^/2/(f/2)!  is  the 
volume  of  the  ^-dimensional  unit  ball.  Dividing  by  the  size  of  X  and  accounting  for  the  rare  event  that  s\  (Y) 
is  not  bounded  as  above,  we  get  that  l(m,  £,  y)  is  e-uninvertible  for  e  =  0(ams/sflY  /\X\  +  □ 


We  can  now  prove  the  one-wayness  of  the  SIS  function  family  by  defining  and  analyzing  an  appropriate 
lossy  function  family.  The  parameters  below  are  set  up  to  expose  the  connection  with  LWE,  via  Proposi¬ 
tion  2.9:  SIS(m,  m  —  n,q)  corresponds  to  LWE  in  n  dimensions  (given  m  samples),  whose  one-wayness 
we  are  proving,  while  SIS(£  =  m  —  n  +  k,m  —  n,q)  corresponds  to  LWE  in  k  <  n  dimensions,  whose 
pseudorandomness  we  arc  assuming. 

Theorem  4.5.  Let  q  be  a  modulus  and  let  X ,  y  be  two  distributions  over  Zm  and  Z/  respectively,  where 
£  =  m  —  n  +  k  for  some  0  <  k  <  n  <  m,  such  that 

1.  I(m,  £,  y)  is  uninvertible  with  respect  to  input  distribution  X, 

2.  SIS(f ,  m  —  n,  q)  is  pseudorandom  with  respect  to  input  distribution  y,  and 

3.  SIS  (to,  to  —  n,  q)  is  second-preimage  resistant  with  respect  to  input  distribution  X. 

Then  T  =  SIS  (to,  to  —  n,  q)  is  one-way  with  respect  to  input  distribution  X. 

In  particular,  if  SIS(£,  to  —  n,  q)  is  pseudorandom  with  respect  to  the  discrete  Gaussian  distribution 
y  =  D%  a,  then  SIS(m,  m  —  n,q)  is  (2e  +  2 ~n(m))-one-way  with  respect  to  the  uniform  input  distribution 
X  =U(X)  over  any  set  X  C  {— s, . . . ,  s}m  satisfying 

(< C'oms/Viy/e  <\X\<e-  ( q/s')m-n , 

where  s'  is  the  largest  divisor  of  q  that  is  smaller  than  or  equal  to  2s,  and  C  is  the  universal  constant  hidden 
by  the  Of)  notation  from  Lemma  4.4. 

Proof.  We  will  prove  that  (£,  T ,  X)  is  a  lossy  function  family,  where  T  =  SIS(to,  to  —  n,q)  and  C  = 
SIS(f,  to  —  n,  q)  o  T(rn,  £.  f  ).  It  follows  from  Lemma  2.3  that  both  T  and  C  arc  one-way  function  families 
with  respect  to  input  distribution  X.  Notice  that  T  is  second-preimage  resistant  with  respect  to  X  by 
assumption.  The  indistinguishability  of  C  and  T  follows  immediately  from  the  pseudorandomness  of 
SIS(£,  to  —  n,  q)  with  respect  to  (V.  by  a  standard  hybrid  argument.  So,  in  order  to  prove  that  (C.  T ,  X)  is 
a  lossy  function  family,  it  suffices  to  prove  that  C  is  uninvertible  with  respect  to  X.  This  follows  applying 
Lemma  2.5  to  the  function  family  Z(m,  T)-  which  is  uninvertible  by  assumption.  This  proves  the  first  part 
of  the  theorem. 
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Now  consider  the  particular  instantiation.  Let  X  =  U(X)  be  the  uniform  distribution  over  a  set 
X  C  {—s, . . . ,  s}m  whose  size  satisfies  the  inequalities  in  the  theorem  statement,  and  let  y  =  D%  a. 
Since  | X\(s'  /  q)m~n  <  e,  by  Lemma  4.1,  SIS(m,  m  —  n,  q)  is  (statistically)  second-preimage  resistant  with 
respect  to  input  distribution  X.  Moreover,  since  (Cams/sTf)1  /\X\  <  e,  by  Lemma  4.4,  Z(m,£,y)  is 
(e  +  2  "(mi)-uninvertiblc  with  respect  to  input  distribution  X.  □ 

In  order  to  conclude  that  the  LWE  function  is  pseudorandom  (under  worst-case  lattice  assumptions)  for 
uniformly  random  small  errors,  we  combine  Theorem  4.5  with  Corollary  2.14,  instantiating  the  parameters 
appropriately.  For  simplicity,  we  focus  on  the  important  case  of  a  prime  modulus  q.  Nearly  identical  results 
for  composite  moduli  (e.g.,  those  divisible  by  only  small  primes)  are  also  easily  obtained  from  Corollary  2.14, 
or  by  using  either  Theorem  2. 13  or  Theorem  2. 12. 

Theorem  4.6.  Let  0  <  k  <  n  <  m  —  tu(log  k )  <  k °(l\  £  =  m  —  n  +  k,  s  >  (CmY^n~k^  for  a  large 
enough  universal  constant  C,  and  q  be  a  prime  such  that  max{3\/ k,  (4 s)mAm-n)}  <  q  <  kP^l\  For 
any  set  X  C  {— s, . . . ,  s}m  of  size  \X  \  >  sm,  the  SIS(m,  m  —  n,  q)  ( equivalently ,  LWE(m,  n,  q))  function 
family  is  one-way  ( and  pseudorandom)  with  respect  to  the  uniform  input  distribution  X  =  U(X),  under  the 
assumption  that  SIVP7  is  (quantum)  hard  to  approximate,  in  the  worst  case,  on  k-dimensional  lattices  to 
with  in  a  factor  7  =  0(Vk  ■  q). 

A  few  notable  instantiations  are  as  follows.  To  obtain  pseudorandomness  for  binary  errors,  we  need  s  =  2 
and  X  =  {0,  l}m.  For  this  value  of  s,  the  condition  s  >  (Crrif^n  can  be  equivalently  be  rewritten  as 

mS(n-*)-(1  +  i^ iPri)' 

which  can  be  satisfied  by  taking  k  =  n/ (C  log2  n)  and  m  =  n(f  +  l/(c  log2  n ))  for  any  desired  c  >  1  and  a 
sufficiently  large  constant  C'  >  1/(1  —  1/c).  For  these  values,  the  modulus  should  satisfy  q  >  = 

8 n3c  =  k°^l\  and  can  be  set  to  any  sufficiently  large  prime  p  =  k°1'1'1 } 

Notice  that  for  binary  errors,  both  the  worst-case  lattice  dimension  k  and  the  number  rn  —  n  of  “extra” 
LWE  samples  (i.e.,  the  number  of  samples  beyond  the  LWE  dimension  n)  are  both  sublinear  in  the  LWE 
dimension  n:  we  have  k  =  0(n/logn)  and  m  —  n  =  0(n/  log  n).  This  corresponds  to  both  a  stronger 
worst-case  security  assumption,  and  a  less  useful  LWE  problem.  By  using  larger  errors,  say,  bounded  by 
s  =  ne  for  some  constant  e  >  0,  it  is  possible  to  make  both  the  worst-case  lattice  dimension  k  and  number 
of  extra  samples  m  —  n  into  (small)  linear  functions  of  the  LWE  dimension  n,  which  may  be  sufficient  for 
some  cryptographic  applications  of  LWE.  Specifically,  for  any  constant  e  <  f,  one  may  set  k  =  (e/3 )n  and 
m  =  (1  +  e/3 )n,  which  are  easily  verified  to  satisfy  all  the  hypotheses  of  Theorem  4.6  when  q  =  koi  1  ^ 
is  sufficiently  large.  These  parameters  correspond  to  (e/3 )n  =  fl(n)  extra  samples  (beyond  the  LWE 
dimension  n),  and  to  the  worst-case  hairiness  of  lattice  problems  in  dimension  (e/3 )n  =  Ll(n).  Notice  that 
for  e  <  1/2,  this  version  of  LWE  has  much  smaller  errors  than  allowed  by  previous  LWE  hairiness  proofs, 
and  it  would  be  subject  to  subexponential-time  attacks  [2]  if  the  number  of  samples  were  not  restricted.  Our 
result  shows  that  if  the  number  of  samples  is  limited  to  (1  +  e/3)n,  then  LWE  maintains  its  provable  security 
properties  and  conjectured  exponential-time  hardness  in  the  dimension  n. 

One  last  instantiation  allows  for  a  linear  number  of  samples  m  =  c  ■  n  for  any  desired  constant  c  >  1, 
which  is  enough  for  most  applications  of  LWE  in  lattice  cryptography.  In  this  case  we  can  choose  (say) 

’Here  we  have  not  tried  to  optimize  the  value  of  q,  and  smaller  values  of  the  modulus  are  certainly  possible:  a  close  inspection  of 
the  proof  of  Theorem  4.6  reveals  that  for  binary  errors,  the  condition  q  >  8 n3c  can  be  replaced  by  q  >  nc  for  any  constant  d  >  c. 
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k  =  n/2,  and  it  suffices  to  set  the  other  parameters  so  that 


s  >  (Cm)20-1  and  q  >  (4s)c^c~l)  >  Ac^c~1^  •  (( Ccn )2c+i+i/(c-i)  =  kO(i) 


(We  can  also  obtain  better  lower  bounds  on  s  and  q  by  letting  k  be  a  smaller  constant  fraction  of  n.)  This 
proves  the  hardness  of  LWE  with  uniform  noise  of  polynomial  magnitude  s  =  l  \  and  any  linear  number 
of  samples  m  =  0(n).  Note  that  for  m  =  cn,  any  instantiation  of  the  parameters  requires  the  magnitude  s 
of  the  errors  to  be  at  least  rac_1.  For  c  >  3/2,  this  is  more  noise  than  is  typically  used  in  the  standard  LWE 
problem,  which  allows  errors  of  magnitude  as  small  as  0(^/n),  but  requires  them  to  be  independent  and 
follow  a  Gaussian-like  distribution.  The  novelty  in  this  last  instantiation  of  Theorem  4.6  is  that  it  allows  for  a 
much  wider  class  of  error  distributions,  including  the  uniform  distribution,  and  distributions  where  different 
components  of  the  error  vector  are  correlated. 

Proof  of  Theorem  4.6.  We  prove  the  one-wayness  of  SIS(m,  m  —  n,q)  (equivalently,  LWE(m,  n.  q)  via 
Proposition  2.9)  using  the  second  part  of  Theorem  4.5  with  a  =  3 \fk.  Using  /:'  >  k  and  the  primality  of  q, 
the  conditions  on  the  size  of  X  in  Theorem  4.5  can  be  replaced  by  simpler  bounds 


(3  C'ms)e 
e 


<  \X\  <e-qm~n, 


or  equivalently,  the  requirement  that  the  quantities  (SCfnsf  /\X |  and  \X\/qm~n  are  negligible  in  k.  For  the 
first  quantity,  letting  C  =  4C"  and  using  \X \  >  sm  and  s  >  (4 C'm)e^n~k\  we  get  that  (2>CmsY /\X\  < 
(3/4) — ^  <  (3/4)-fc  is  exponentially  small  (in  /,;).  For  the  second  quantity,  using  \X \  <  (2s  +  l)m  and 
q  >  (4 s)m/(m-n)^  we  get  tiiat  \x\/qm~n  <  (3/4)m  is  also  exponentially  small. 

Theorem  4.5  also  requires  the  pseudorandomness  of  SIS(£,  m  —  n,q )  with  respect  to  the  discrete  Gaussian 
input  distribution  y  =  79 1  a,  which  can  be  based  on  the  (quantum)  worst-case  hardness  of  SIVP  on  k- 
dimensional  lattices  using  Corollary  2.14.  (Notice  the  use  of  different  parameters:  SIS(m,  m  —  n,  q)  in 
Corollary  2.14,  and  SIS(m  —  n  +  k,m  —  n,q)  here.)  After  properly  renaming  the  variables,  and  using 
a  =  3 Yk,  the  hypotheses  of  Corollary  2.14  become  cu(log  k)  <  m  —  n  <  k°^\  3 \fk  <  q  <  k which 
arc  all  satisfied  by  the  hypotheses  of  the  Theorem.  The  corresponding  assumption  is  the  worst-case  hardness 
of  SIVP7  on  A’-dimensional  lattices,  for  7  =  kuerq/n  =  \fkukq/3  =  0(Vkq),  as  claimed.  This  concludes 
the  proof  of  the  one-wayness  of  LWE. 

The  pseudorandomness  of  LWE  follows  from  the  sample -preserving  search-to-decision  reduction  of 
[17].  □ 
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How  to  Delegate  Secure  Multiparty  Computation  to  the  Cloud 

Abstract 

We  study  the  problem  of  verifiable  computation  in  the  presence  of  many  clients  who  rely  on  a  server  to 
perform  computations  over  inputs  privately  held  by  each  client.  This  generalizes  the  single-client  model  for 
verifiable  outsourced  computation  previously  studied  in  the  literature. 

We  put  forward  a  computational  model  and  strong  simulation-based  security  for  this  task.  We  then  present 
a  new  protocol  that  allow  the  clients  to  securely  outsource  an  arbitrary  polynomial-time  computation  over  pri¬ 
vately  held  inputs  to  a  powerful  server.  At  the  end,  the  clients  will  be  assured  that  the  result  of  the  computation 
is  correct,  while  at  the  same  time  protecting  their  data  from  the  server  and  each  other. 

Our  new  protocol  satisfies  the  crucial  efficiency  requirement  of  outsourced  computation  where  the  work 
of  the  client  is  substantially  smaller  than  what  is  required  to  compute  the  function.  We  use  the  Gennaro 
el  al.  amortized  model,  where  the  clients  are  allowed  to  invest  in  a  one-time  computationally  expensive 
preprocessing  phase.  Our  protocol  is  secure  in  the  real/ideal  paradigm  even  when  dishonest  clients  can  collude 
with  the  server  in  order  to  learn  honest  party’s  inputs  or  in  order  to  maliciously  change  the  output  of  the 
computation. 

Keywords:  verifiable  computation,  secure  multi-party  computation 
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1  Introduction 

Consider  the  following  scenario:  a  set  of  computationally  weak  devices  holding  private  inputs,  wish  to  jointly 
compute  a  function  F  over  those  inputs.  Each  device  does  not  have  the  power  to  compute  F  on  its  own,  let 
alone  engaging  in  a  secure  multiparty  computation  protocol  such  as  [Yao82,  GMW87,  BOGW88,  CCD88]  for 
computing  F.  They  can  however  access  the  services  of  a  “computation  provider”  who  can  “help”  them  compute 
F.  To  maintain  the  privacy  of  their  input,  the  clients  need  to  engage  in  a  cryptographic  protocol  where  the 
provider  does  the  bulk  of  the  computation  (i.e.,  computes  F),  while  the  computation  and  communication  of  each 
client  is  “low”  (in  particular,  less  than  the  time  it  takes  to  compute  F).  Nevertheless  the  clients  must  be  able  to 
verify  the  correctness  of  the  output  of  this  protocol,  even  under  the  assumption  that  some  corrupted  clients  might 
cooperate  with  a  malicious  provider  to  fool  them  into  accepting  an  incorrect  ouput  or  to  learn  their  input. 

One  can  think  of  this  problem,  called  multi-client  verifiable  computation  or  multi-client  delegation  of  com¬ 
putation,  as  a  secure  multi-party  computation  protocol  between  the  clients  and  the  provider  (the  cloud  service), 
where  however  only  the  provider’s  work  is  allowed  to  be  proportional  to  the  complexity  of  the  function  being 
computed  (the  function  that  computes  the  joint  statistics).  In  this  paper,  we  solve  the  above  problem,  by  relying 
on  only  standard  polynomial  time  cryptographic  assumptions.  We  present  a  protocol  that  allows  many  compu¬ 
tationally  weak  clients  to  securely  outsource  a  computation  over  privately  held  inputs  to  a  powerful  server,  in 
the  presence  of  the  most  powerful  adversarial  model  that  can  be  considered,  and  by  minimizing  the  “on-line” 
computation  by  the  clients.  The  round  complexity  being  of  paramount  importance  in  this  line  of  work,  we  follow 
the  convention  of  obtaining  a  solution  in  which  the  clients  delegates  the  computation  non-interactively  (i.e.,  the 
clients  and  the  server  exchange  a  single  message). 

While  a  lot  of  work  has  been  devoted  to  secure  outsourced  computation  in  the  case  of  a  single  client  inter¬ 
acting  with  a  single  server  (see  for  example  [Mic94,  GKR08,  GGP10,  CKV10,  AIK10]),  the  research  effort  for 
the  multi-client  case  is  still  in  the  preliminary  stages  with  very  few  works  that  consider  much  weaker  models  of 
security  (we  shall  discuss  these  works  in  detail  later  on). 

1.1  Our  Model 

Before  we  state  our  main  results,  we  shall  first  take  a  closer  look  at  the  model  in  which  we  work  in.  In  a  nutshell, 
we  obtain  our  results  a)  based  on  standard  cryptographic  hardness  assumptions;  b)  in  the  strongest  adversarial 
model  -  i.e.,  simulation  based  security  in  the  ideal/real  paradigm  when  malicious  clients  may  collude  with  the 
server;  and  c)  with  minimal  communication  between  the  clients  and  that  too  only  when  verifying  the  results  of  the 
computation.  In  explaining  our  model,  we  consider  three  important  design  principles  that  influence  our  choice  - 
first,  the  cryptographic  hardness  assumption  that  we  make;  second,  the  corruption  model  (which  parties  can  the 
adversary  corrupt),  and  finally,  the  communication  model  (how  much  do  clients  need  to  interact  with  each  other 
and  with  the  server). 

Hardness  assumptions:  standard  cryptographic  assumptions.  Following  the  standard  convention  in  cryptog¬ 
raphy,  we  arc  interested  in  constructing  multi-client  verifiable  computation  protocols  based  on  standard  crypto¬ 
graphic  assumptions  (i.e.,  without  resorting  to  random  oracles  or  non-falsifiable  hardness  assumptions).  Further¬ 
more,  we  arc  interested  in  obtaining  solutions  where  the  interaction  between  the  clients  and  the  server  is  minimal, 
i.e.,  only  one  message  is  sent  in  each  direction  between  the  client  and  the  server.  We  note  that  obtaining  such 
solutions  is  a  difficult  problem  even  in  the  single-client  setting,  exemplified  by  the  small  number  of  known  solu¬ 
tions  [GGP10,  CKV 10,  AIK  10].  In  particular,  all  known  single-client  non-interactive  solutions  based  on  standard 
assumptions  work  in  an  “amortized”  computational  model  (also  known  as  the  pre-processing  model)  [GGP10]. 
In  view  of  the  above,  in  this  work,  we  will  also  work  in  (a  natural  extension  of)  the  pre-processing  model,  which 
we  discuss  later  on  in  this  section. 

Corruption  model:  simulation  based  security.  As  we  discussed  earlier,  it  is  quite  natural  to  have  a  situation 
where  one  of  the  clients  might  collude  with  a  corrupt  server  in  order  to  either  learn  something  about  the  honest 
client’s  inputs  or  to  force  the  output  of  the  computation  to  some  value.  Naturally,  it  would  be  highly  desirable  to 
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construct  protocols  that  arc  secure  even  against  such  adversaries.  Furthermore,  simulation-based  security  in  the 
real/ideal  security  paradigm  being  the  bench  mark  for  security  in  cryptography,  we  would  like  to  obtain  protocols 
that  are  secure  in  this  sense.  Finally,  we  would  like  our  protocols  to  be  as  widely  applicable  as  possible,  thus  we 
choose  to  work  in  the  strongest  corruption  model. 

We  remark  that  the  outsourcing  of  multi-party  computation  has  been  studied  in  weaker  security  mod¬ 
els  [KMR1 1,  CKKC13].  We  shall  discuss  these  works  more  in  detail  later  on  in  this  section. 

Communication  model:  minimal  interaction  between  parties.  Since  we  wish  to  only  rely  on  standard  crypto¬ 
graphic  assumptions,  we  shall  work  in  the  pre-processing  model.  In  the  pre-processing  model,  the  clients,  at  the 
start  of  the  protocol,  execute  a  preprocessing  phase  which  is  a  one-time  stage  in  which  the  clients  compute  public 
as  well  as  private  information  associated  with  the  function  F  that  they  wish  to  outsource.  The  computation  of 
each  client  in  this  phase  is  allowed  to  be  proportional  to  the  computational  complexity  of  evaluating  F.  Commu¬ 
nication  being  at  a  premium,  one  would  like  to  have  protocols  in  which  the  interaction  between  the  clients  and 
the  server  (and  amongst  the  clients)  is  minimized.  Now,  ideally,  it  would  be  great  if  one  could  obtain  a  protocol 
in  which  the  clients  interacted  only  in  the  pre-processing  phase,  then  interacted  with  the  server  once  individually 
(by  sending  and  receiving  exactly  one  message)  and  then  obtained  the  results  of  the  computation.  Unfortunately, 
this  is  impossible  to  achieve  in  our  security  model  -  one  can  easily  see  that  if  the  clients  did  not  interact  with  each 
other,  after  exchanging  one  message  with  the  server,  then  one  cannot  obtain  a  simulation-based  secure  protocol 
that  is  secure  against  a  colluding  client  and  server  (more  specifically,  for  a  fixed  input  of  the  honest  client,  the 
colluding  client  and  server  would  be  able  to  obtain  the  output  of  the  computation  on  several  inputs  of  their  choice, 
thus  violating  the  requirements  of  a  simulation-based  definition).  Thus,  clients  need  to  interact  with  each  other 
in  order  to  obtain  the  results  of  the  computation;  the  focus  would  then  be  on  minimizing  this  interaction. 

The  above  choices  (allowing  the  clients  to  perform  expensive  computation  and  communication  during  the 
off-line  phase,  but  then  restricting  them  to  a  single  message  exchange  during  the  on-line  phase)  might  seem 
artificial.  Yet  there  arc  several  practical  scenarios  where  this  arc  relevant.  Consider  the  case  of  military  coalitions 
where  the  clients  are  armies  from  different  countries  and  are  in  need  to  perform  joint  computations  on  data  that 
might  need  to  be  kept  private  by  each  army.  It  is  conceivable  that  the  off-line  phase  will  be  performed  over  a 
trusted  network  before  the  deployment  of  soldiers  in  the  field,  and  therefore  computation  and  communication  arc 
not  at  a  premium.  The  situation  however  changes  dramatically  during  the  on-line  phase  where  the  input  to  the 
computation  is  obtained  during  actual  combat  operations  where  battery  power  and  communication  bandwidth 
might  be  severely  limited. 

Advantages  of  the  communication  model.  Our  communication  model  has  two  further  advantages: 

-  The  foremost  advantage  of  our  communication  model  is  that  of  asynchronicity.  Note  that  during  the  out¬ 
sourcing  of  computation,  none  of  the  clients  need  be  present  at  the  same  time.  They  can  send  their  respec¬ 
tive  messages  to  the  servers  at  various  points  of  time.  Only  when  they  wish  to  verify  the  computation  do 
clients  have  to  synchronize  and  run  a  computation  (which  is  unavoidable  within  our  framework  of  security). 

-  Another  advantage  of  the  clients  not  communicating  during  the  online  phase  is  that  clients  could  batch 
together  multiple  computations  and  at  the  end  could  verify  all  the  computations  together. 

Description  of  our  model.  Given  the  most  natural  and  useful  design  choices  above,  we  now  describe  the  model 
in  which  we  obtain  our  protocols.  Our  protocol  for  outsourcing  multi-party  computation  consists  of  three  phases: 
the  preprocessing  phase,  the  online  phase,  and  the  offline  phase. 

The  preprocessing  phase  is  a  one-time  stage  in  which  the  clients  compute  public  as  well  as  private  information 
associated  with  the  function  F  that  they  wish  to  outsource.  The  computation  of  each  client  in  this  phase  is  allowed 
to  be  proportional  to  the  computational  complexity  of  evaluating  F.  We  also  allow  clients  to  interact  with  each 
other  in  an  arbitrary  manner  in  this  phase.  This  phase  is  executed  only  once  and  is  independent  of  the  client’s 
inputs. 
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In  the  online  phase  clients  individually  prepare  public  and  private  information  about  their  respective  inputs 
and  send  a  single  message  to  the  server.  The  server,  upon  receiving  these  messages  from  all  the  clients,  sends 
back  a  single  message  to  each  of  the  clients  (the  messages  sent  by  the  server  to  each  of  the  clients  can  potentially 
be  different  from  one  another).  Note  that  we  do  not  allow  the  clients  to  communicate  directly  with  each  other  in 
this  phase.  The  computational  complexity  of  the  clients  in  this  phase  is  required  to  be  independent  of  F. 

Finally,  in  the  offline  phase ,  the  clients  interact  with  each  other  to  decode  the  response  provided  to  them  by 
the  server  and  obtain  the  result  of  the  computation.  We  will  focus  on  minimizing  the  interaction  between  the 
clients  in  this  phase.  Furthermore,  the  computational  complexity  of  the  clients  in  this  phase  is  also  required  to  be 
independent  of  F. 

We  obtain  a  protocol  that  enjoys  simulation-based  security  in  the  real/ideal  paradigm  and  is  secure  even 
against  an  adversarial  client  who  colludes  with  a  malicious  server.  As  we  will  show,  requiring  security  against 
a  colluding  adversary  and  requiring  a  simulation  based  definition  of  security  to  be  met,  present  significant 
challenges  that  need  to  be  overcome  in  order  to  construct  a  protocol  for  outsourcing  multi-party  computa¬ 
tion.  Finally,  our  protocol  makes  use  of  standard  cryptographic  assumptions  (and  not  random  oracles  or  non- 
falsifiable  assumptions).  We  note  here  that  our  solution,  just  like  the  solutions  for  single-party  verifiable  compu¬ 
tation  [GGP10,  CKV 10],  require  that  the  pre-processing  phase  be  executed  again,  in  the  event  that  the  output  of 
a  computation  is  rejected  by  one  of  the  clients.  In  other  words,  we  cannot  reveal  to  a  malicious  server  that  the 
result  of  the  computation  was  rejected,  and  then  continue  with  another  verifiable  computation  protocol  with  the 
same  pre-processing  information. 

Alternative  approaches  to  delegating  multi-party  computation.  Note,  that  if  one  were  to  resort  to  using 
random  oracles  or  making  use  of  non-falsifiable  hardness  assumptions  [Nao03],  then  it  is  easy  to  construct  multi¬ 
client  verifiable  computation  protocols.  Very  briefly,  the  clients  can  simply  send  their  inputs  to  the  server  and 
the  server  can  return  the  result  of  the  computation  along  with  a  succinct  non-interactive  argument  (SNARG) 
[Mic94,  GW11,  BCCT12,  GLR11,  DFH12]  proving  that  it  evaluated  the  output  honestly.  Privacy  of  the  clients 
inputs  can  be  obtained  through  standard  techniques  (e.g.,  via  the  use  of  fully  homomorphic  encryption).  However, 
this  solution  is  uninteresting  from  the  point  of  view  of  the  non-standard  hardness  assumption  required  to  prove  it 
secure. 

Also  if  we  relax  the  security  notion  to  only  consider  non-colluding  adversaries  (that  is  a  malicious  client  and 
a  malicious  server  do  not  collude),  and  if  we  do  not  wish  to  obtain  the  stronger  simulation-based  definition  of 
security,  then  the  work  of  Kamara,  Mohassel,  and  Raykova  [KMR1 1]  shows  how  to  outsource  multi-party  com¬ 
putation.  With  the  important  focus  on  removing  interaction  between  clients,  the  work  of  Choi  el  al.  [CKKC13] 
consider  multi-client  non-interactive  verifiable  computation  in  which  soundness  guarantees  are  provided  against 
a  malicious  server  when  all  clients  are  honest;  they  also  define  privacy  guarantees  separately  against  a  server  and 
against  a  client.  We  note  that  this  is  much  weaker  than  the  simulation  based  security  model  that  we  work  in  that 
captures  soundness  and  privacy  against  malicious  clients  and  server  colluding  with  each  other. 

Finally,  we  remark  that  if  we  did  allow  the  clients  to  interact  in  the  online  phase  (and  sacrifice  on  asyn¬ 
chronicity),  then  one  can  trivially  obtain  a  protocol  for  outsourcing  multi-party  computation  from  any  single¬ 
party  protocol  for  outsourcing  computation  [GGP10,  CKV10,  AIK10];  in  the  online  phase,  the  clients  simply 
“simulate”  a  single  party  by  running  a  secure  computation  protocol  to  compute  the  message  sent  by  the  client  in 
the  single-party  protocol.  As  discussed  above,  in  our  view,  this  is  a  particularly  unsatisfactory  approach,  and  of 
limited  interest. 

1.2  Our  Results 

In  this  work,  we  show  how  to  construct  a  secure  protocol  for  two-party  verifiable  computation  in  the  pre¬ 
processing  model.  We  highlight  the  key  features  in  our  protocol: 

•  In  our  solution,  the  clients  perform  work  proportional  to  F  only  in  the  pre-processing  phase  (executed 
once),  and  have  computational  complexity  independent  of  F  in  the  remainder  of  the  protocol  (the  online 
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phase  and  the  offline  phase). 

•  The  clients  exchange  only  a  single  message  with  the  server  (in  the  online  phase)  and  hence  our  protocols 
arc  “non-interactive”  with  regards  to  interaction  with  the  server. 

•  Furthermore,  a  critical  point  of  our  protocol  is  that  the  clients  do  not  have  to  interact  with  each  other  in  the 
online  phase  -  this  allows  the  clients  to  batch  together  several  (unbounded)  computations  together  before 
running  the  verification  to  obtain  the  results  of  the  computations. 

•  We  provide  simulation-based  security  via  the  real/ideal  security  paradigm  and  our  protocol  is  secure  against 
a  colluding  adversarial  client  and  adversarial  server.  This  provides  both  soundness  of  the  computation  (an 
honest  client  never  accepts  an  incorrect  output)  and  privacy  of  the  honest  client’s  inputs. 

We  then  show  how  to  extend  our  two-party  protocol  to  the  multi-party  setting  and  obtain  a  multi-party  outsourcing 
computation  protocol  with  the  same  features  as  above  (our  protocol  is  secure  against  a  constant  fraction  of 
adversarial  clients  who  may  collude  with  a  malicious  server).  Finally,  we  show  how  to  reduce  the  interaction 
between  the  clients  in  the  offline  phase. 

In  other  words,  our  protocol  works  in  the  strongest  possible  adversarial  model,  and  apart  from  the  prepro¬ 
cessing  phase,  has  minimal  interaction. 

1.3  Overview 

Starting  point.  We  start  with  the  goal  of  trying  to  “bootstrap”  a  single-client  delegation  scheme  into  a  dele¬ 
gation  scheme  for  multiple  clients.  Thus,  our  first  consideration  is  what  kind  of  properties  arc  desirable  from 
a  single  client  delegation  scheme  in  order  to  achieve  our  goal.  Interestingly,  we  observe  that  constructing  a 
multi-client  delegation  scheme  against  colluding  adversaries  turns  out  to  be  very  sensitive  to  the  choice  of  the 
underlying  single-client  delegation  scheme.  Specifically,  recall  that  the  construction  of  [GGP10]  uses  the  authen¬ 
ticity  property  of  Yao’s  garbled  circuits  to  enforce  correctness  of  computation.  However,  analyzing  the  security 
of  garbled  circuits  in  the  setting  where  the  honest  client  may  share  secret  keys  with  the  adversary  (which  seems 
necessary  for  security  against  colluding  adversaries)  does  not  seem  very  amenable. 

In  light  of  the  above,  we  choose  to  instead  work  with  the  delegation  scheme  of  Chung,  Kalai,  and  Vad- 
han  [CKV 10].  We  briefly  recall  it  below. 

Delegation  Scheme  of  [CKV10].  Let  F  be  the  functionality  that  we  wish  to  outsource.  The  client  picks  a 
random  r  and  computes  F(r)  in  the  preprocessing  phase.  Next,  in  the  online  phase,  after  receiving  the  input 
x,  the  client  picks  a  random  bit  b  and  sends  either  (x,  r)  or  (r,  x)  to  the  server  (depending  on  the  bit  b).  The 
server  must  compute  F  on  both  x  and  r  and  return  the  responses  back  to  the  client.  The  client  will  check 
that  F(r)  is  correct  and  if  so  accept  F(x').  Now,  suppose  x  comes  from  the  uniform  distribution,  then  this 
protocol  is  a  sound  protocol  and  a  cheating  server  can  succeed  only  with  probability  ^  (as  he  cannot  distinguish 
(x,  r )  from  (r,  x)  with  probability  better  than  |).  For  arbitrary  distributions,  this  approach  fails,  but  this  can 
be  rectified  by  having  the  client  additionally  pick  a  public  key  for  an  FHE  scheme  (in  the  preprocessing  phase) 
and  sending  (EnCpfc(x),  Encp/;.(r))  or  (Encpfc(r),  Enc;,/.(.x')),  depending  on  bit  b  in  the  online  phase.  The  server 
will  homomorphically  evaluate  the  function  F  and  respond  back  with  En cpk(F(x))  and  Enc pk(F(r)).  Now,  this 
protocol  is  sound  for  arbitrary  distributions  of  x.  One  can  boost  the  soundness  error  to  be  negligibly  small  by 
picking  random  r\,  ■  ■  ■  ,rK  and  having  the  client  pick  b\,  ■  ■  ■  ,  bK  and  send  (Encp/,.(.x'),  Enc pk(ri))  (or  the  other 
way  around,  depending  on  bi ).  In  order  to  make  this  protocol  re-usable  with  the  same  values  of  n,  •  •  •  ,rK, 
[CKV  10]  need  to  run  this  entire  protocol  under  one  more  layer  of  fully  homomorphic  Encryption. 

One  might  hope  that  we  can  apply  this  protocol  directly  by  having  the  clients  simulate  the  single  client  using 
multiparty  computation.  More  specifically:  the  clients  jointly  generate  the  pre-processing  information  needed  in 
the  single-client  case  in  the  pre-processing  phase  of  the  protocol.  In  the  online  phase,  the  clients  jointly  generate 
the  message  sent  by  the  single  client  (using  a  secure  computation  protocol)  and  similarly,  in  the  offline  phase, 
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the  clients  jointly  run  a  secure  computation  to  verify  the  results  of  the  computation  sent  by  the  server.  The  main 
issue  with  this  approach  is  that  this  solution  requires  the  clients  to  interact  (heavily)  with  each  other  in  the  online 
phase  -  which  we  believe,  as  discussed  earlier,  to  be  a  significant  drawback. 

A  Failed  Approach,  Towards  that  end,  we  consider  the  following  approach.  Let  us  consider  the  two-client 
setting  first  (where  client  D,  holds  input  xf).  As  before  the  clients  will  jointly  generate  the  information  needed 
for  pre-processing  via  a  secure  computation  protocol.  Our  first  idea  is  to  have  each  client  independently  generate 
a  bit  bi  in  the  online  phase  and  send  either  (Enc pk(xi),  En cp/,(rj))  or  (Encp/C(rj),  En cpf.(xi))  to  the  server  in  the 
online  phase.  Now,  the  server  instead  of  computing  two  outputs  will  compute  4  ciphertexts  (since  the  server 
does  not  know  bits  b\  and  62,  it  will  compute  ciphertexts  corresponding  to  F(x  1,  .x'2),  F{x\ ,  r2),  F(r\,  x2),  and 
F(t\  ,  7*2 )  •  The  clients  can  then  in  the  offline  phase  verify  the  appropriate  responses  of  the  server  and  accept 
F(Xl,x2)  if  all  checks  succeed. 

Unfortunately,  this  solution  completely  fails  in  the  case  when  a  client  (say  D2  colludes  with  the  server).  Very 
briefly,  the  main  problem  with  the  above  approach  is  that  it  does  not  guarantee  any  input  independence,  a  key 
requirement  for  a  secure  computation  protocol.  To  see  the  concrete  problem,  recall  that  in  order  to  realize  the 
standard  real/ideal  paradigm  for  secure  computation,  we  need  to  construct  a  simulator  that  simulates  the  view 
of  the  adversary  in  such  a  manner  that  output  distributions  of  the  real  and  ideal  world  executions  arc  indistin¬ 
guishable.  The  standard  way  such  a  simulator  works  is  by  “extracting”  the  input  of  the  adversary  in  order  to 
compute  the  “correct”  protocol  output  (by  querying  the  ideal  functionality),  and  then  “enforcing”  this  output  in 
the  protocol.  The  main  issue  that  arises  in  the  above  approach  is  that  we  cannot  guarantee  that  the  adversary’s 
input  extracted  by  the  simulator  is  consistent  with  the  output  of  the  honest  client  in  the  real  world  execution.  In 
particular,  note  that  in  the  real  world  execution,  the  clients  simply  check  whether  the  output  of  the  server  contains 
the  correct  F(r\,  r2)  value,  and  if  the  check  succeeds,  then  they  decrypt  the  appropriate  output  value  and  accept  it 
as  their  final  output.  Then,  to  argue  indistinguishability  of  the  real  and  ideal  world  executions,  we  would  need  to 
argue  that  the  simulator  extracts  the  specific  input  of  the  corrupted  client  that  was  used  to  compute  the  output  by 
the  (colluding)  worker.  However,  the  above  approach  provides  no  way  of  enforcing  this  requirement.  As  such,  a 
colluding  server  and  client  pair  can  lead  the  simulator  to  compute  an  output  which  is  inconsistent  with  the  output 
generated  by  the  server,  in  which  case  the  simulator  must  abort.  Yet,  in  the  real  world  execution,  the  honest  client 
would  have  accepted  the  output  of  the  server.  Thus,  the  simulator  fails  to  generate  an  indistinguishable  view  of 
the  adversary. 

Indeed,  the  above  attack  demonstrates  that  when  requiring  simulation-based  security  against  malicious  coali¬ 
tions,  current  techniques  for  single-client  verifiable  computation  are  not  sufficient  and  new  ideas  are  needed.  The 
main  contribution  of  our  paper  is  a  technique  to  overcome  the  above  problem. 

Our  Solution.  On  careful  inspection  of  the  above  approach,  one  can  observe  that  it  does  provide  some  weak 
form  of  guarantee  -  that  the  server  actually  correctly  computes  the  function  F;  however,  F  is  computed  correctly 
w.r.t.  “some”  input  for  the  corrupted  client.  In  particular,  the  input  of  corrupted  client  may  not  be  chosen 
independently  of  the  honest  client’s  input. 

In  order  to  solve  our  problem,  our  main  idea  is  to  leverage  the  above  weak  guarantee  by  changing  the  func¬ 
tionality  that  is  being  delegated  to  the  worker.  Roughly  speaking,  we  essentially  “ delegate  the  delegation  func¬ 
tionality”.  More  concretely,  we  change  the  delegation  functionality  to  G(X,  Y)  =  Evalp^.fY.  V.  F),X,  Y ,  where 
X  and  Y  arc  encryptions  of  the  inputs  x  and  y  of  the  clients  under  the  public  key  pk  of  the  FHE  scheme. 

In  order  to  understand  why  the  above  solution  works,  let  us  first  consider  an  intermediate  attempt  where  we 
delegate  the  functionality  F(x,  y)  =  Fix,  y),  x,  y.  The  underlying  idea  is  to  use  the  weak  correctness  guarantee 
(discussed  above)  to  validate  the  input  of  the  corrupted  client.  In  more  detail,  note  that  if  we  delegate  the  func¬ 
tionality  F,  then  in  the  real  world  execution,  we  can  have  the  clients  perform  the  check  (during  the  verification 
protocol)  that  the  output  value  contains  the  same  y  value  that  is  extracted  by  the  simulator.  Indeed,  a  priori,  it 
may  seem  that  this  approach  should  indeed  solve  the  above  problem  as  we  obtain  a  guarantee  that  the  input  of 
the  malicious  party  in  the  output  value  (y)  was  indeed  the  same  input  used  in  the  computation  (as  we  arc  guar- 
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anteed  correctness  of  the  computation).  Unfortunately,  a  subtle  issue  arises  when  trying  to  argue  correctness  of 
the  simulator.  Note  that  the  above  discussed  check  involves  decrypting  the  output.  While  this  is  indeed  possible 
in  the  real  world  execution,  it  is  not  clear  how  to  perform  the  same  check  during  simulation.  Indeed,  in  the  proof 
of  security,  we  would  need  to  rely  on  the  semantic  security  of  the  underlying  FHE  scheme,  which  conflicts  with 
performing  decryption.  We  remark  that  the  natural  approach  of  having  the  simulator  perform  a  related  check 
(e.g.,  perform  Eval  operation  of  the  FHE  scheme  to  match  the  ciphertexts)  rather  than  perform  decryptions  can 
be  defeated  by  specific  attacks  by  the  adversary.  We  do  not  elaborate  on  this  issue  further  here,  but  note  that  in 
order  to  argue  correctness  of  simulation,  we  need  to  ensure  that  the  verification  checks  performed  in  the  honest 
execution  and  the  simulated  executions  must  be  the  same. 

The  above  issue  is  resolved  by  delegating  the  functionality  G  (described  above)  instead  of  F.  The  key  idea 
is  that  instead  of  performing  the  aforementioned  consistency  check  via  decryption  (of  the  single  layer  of  FHE), 
we  can  now  perform  a  similar  check  by  first  decrypting  the  outer  layer,  and  then  re-encrypting  the  input  of  the 
client  (the  clients  arc  supposed  to  provide  the  encryption  randomness  in  the  verification  protocol)  and  matching 
it  with  the  values  X  and  Y,  respectively.  The  simulator  can  now  be  in  possession  of  the  outer  layer  FHE  secret 
key  since  we  can  rely  on  the  semantic  security  of  the  inner  layer  FHE. 

The  above  description  is  somewhat  oversimplified  for  lack  of  space.  We  refer  the  reader  to  the  protocol 
description  for  more  details. 

Handling  more  than  two  clients.  It  is  easy  to  see  that  the  obvious  extension  of  the  above  protocol  to  the  case 
of  n  clients  requires  the  server  to  compute  and  return  2n  values.  Here  we  informally  describe  how  to  avoid  this 
problem.  Recall  that  each  client  picked  a  random  bit  bt  and  sent  either  (xt,  n )  or  (r.,. ,  xt ) ,  both  doubly  encrypted, 
depending  on  the  bit  6,.  To  avoid  the  exponential  blow  up,  we  have  the  n  clients  jointly  generate  random  bits 
&!,•••  ,  bn  such  that  exactly  one  ht  =  1  (where  i  is  random  in  [n])  and  all  other  bj  =  0,  j  /  i  (Doing  this  without 
interaction  is  a  significant  challenge,  but  we  show  that  this  can  be  achieved.).  Now,  the  server  only  needs  to 
compute  2 n  ciphertexts:  n  that  encrypt  F(x\ . . . . ,  xn )  and  n  that  encrypt  F(r i, . . . ,  rn).  In  this  case,  we  can 
prove  security  of  our  protocol  as  long  as  at  most  a  fraction  of  the  clients  arc  corrupted  (even  if  they  collude  with 
the  worker).  More  details  in  Section  5.1. 

1.4  Related  Work 

The  problem  of  efficiently  verifiying  computations  performed  by  an  untrusted  party  has  been  extensively  studied 
in  the  literature  for  the  case  of  a  single  client  outsourcing  the  computation  to  a  server.  Various  approaches  have 
been  used:  interactive  solutions  [GKR08],  SNARG-based  solutions  [BCC88,  Kil92,  Mic94,  BCCT12,  GLR11, 
DFH12,  GrolO,  Lipl2,  GGPR12],  and  pre-processing  model  based  solutions  [GGP10,  CKV10,  AIK10].  The 
works  of  [KMR11]  and  [CKKC12]  consider  outsourcing  of  multi-party  computation  but  consider  only  the  case 
where  adversarial  parties  do  not  collude  with  each  other  or  a  semi-honest  setting.  Finally,  with  a  focus  on 
minimizing  interaction,  the  work  of  [CKKC13]  considers  non-interactive  multi-client  verifiable  computation,  but 
only  consider  soundness  against  a  malicious  server  when  clients  arc  honest  and  privacy,  independently,  against  a 
malicious  server  and  a  malicious  client.  For  a  more  detailed  description  of  related  works,  see  Appendix  A. 

2  Preliminaries 

2.1  Our  Model 

In  this  section,  we  present  in  detail,  the  computation  and  communication  model  as  well  as  the  security  definition 
considered  in  this  paper.  For  simplicity,  we  shall  deal  with  the  two-party  case  of  our  protocol  first  and  then  show 
how  to  extend  this  to  the  mult-party  case.  In  other  words,  we  consdier  the  setting  of  2  clients  (or  delegators) 
V  =  { Di ,  Do}  who  wish  to  jointly  outsource  the  computation  of  any  PPT  function  over  their  private  inputs  to 
a  worker  W.  Specifically,  we  consider  the  case  where  the  clients  wish  to  perform  arbitrarily  many  evaluations 
of  a  function  F  of  their  choice  over  different  sets  of  inputs.  Unlike  the  standard  multiparty  computation  setting, 
we  wish  to  ensure  that  the  computation  of  each  client  is  independent  of  the  amount  of  computation  needed 
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to  compute  F  from  scratch.  The  worker  W,  however,  is  expected  to  perform  computation  proportional  to  the 
size  of  the  circuit  representing  F.  Similar  to  standard  multiparty  computation,  our  corruption  model  allows  an 
adversary  to  maliciously  corrupt  the  worker  and  one  of  the  clients  (a  subset  of  the  clients  in  the  multi-party  case). 
Informally  speaking,  we  require  that  no  such  adversary  learns  anymore  than  what  can  be  learned  from  the  inputs 
of  the  corrupted  clients  and  their  outputs  (and  any  additional  auxiliary  information  that  the  adversary  may  have). 

Note  that  the  above  problem  is  non-trivial  even  in  the  setting  where  a  single  client  wishes  to  outsource 
its  computation  to  a  worker.  Specifically,  all  the  known  solutions  require  either  the  random  oracle  model,  or 
appropriate  non-black-box  assumptions,  or  allow  for  a  one-time  pre-processing  phase  where  the  computation  of 
the  client(s)  is  allowed  to  be  proportional  to  the  size  of  the  circuit  representing  F. 1  In  this  work,  we  wish  to  only 
rely  on  standard  cryptographic  assumptions;  as  such,  we  choose  to  work  in  the  pre-processing  model. 

We  now  proceed  to  give  a  formal  description  of  our  model  in  the  remainder  of  this  section.  We  first  present 
the  syntax  for  a  two-party  verifiable  computation  protocol.  We  then  define  security  as  per  the  the  standard 
real/ideal  paradigm  for  secure  computation.  Throughout  this  work,  for  simplicity  of  exposition,  we  assume  that 
the  function  to  be  evaluated  on  the  clients’  inputs  gives  the  same  outputs  to  all  clients.  We  note  that  the  general 
scenario  where  the  function  outputs  may  be  different  for  each  client  can  be  handled  using  standard  techniques. 

Two-party  Verifiable  Computation  Protocol.  A  2-party  verifiable  computation  protocol  between  2  clients  (or 
delegators)  V  =  {Di,  D2}  and  a  worker  W  consists  of  three  phases,  namely,  (a)  pre-processing  phase,  (b)  online 
phase,  and  (c)  offline  phase.  Here,  the  pre-processing  phase  is  executed  only  once,  while  the  online  phase  is 
executed  every  time  the  clients  wish  to  compute  F  over  a  new  set  of  inputs.  The  offline  phase  may  also  be 
executed  each  time  following  the  online  phase.  Alternatively  (and  crucially),  multiple  executions  of  the  offline 
phase  can  be  batched  together  (i.e.,  the  clients  can  outsource  several  {not  fixed  apriori)  computations  of  F  on 
different  sets  of  inputs  before  they  verify  and  obtain  the  results  of  these  computations).  We  now  describe  the 
three  phases  in  detail: 

Preprocessing  Phase:  This  is  a  one-time  stage  in  which  the  clients  compute  some  public,  as  well  as  private, 
information  associated  with  the  function  F.  The  computation  of  each  client  in  this  phase  is  allowed  to 
be  proportional  to  the  amount  of  computation  needed  to  compute  F  from  scratch.  Clients  arc  allowed  to 
interact  with  each  other  in  an  arbitrary  manner  in  this  phase.  Note  that  this  phase  is  independent  of  the 
clients  inputs  and  is  executed  only  once. 

Online  Phase:  In  this  phase,  the  clients  interact  with  the  server  in  a  single  round  of  communication.  Let  xt  the 
input  held  by  client  D,.  In  order  to  outsource  the  computation  of  F  (on  a  chosen  set  of  inputs  x\,  X2)  to 
the  worker  W,  each  client  D,  individually  prepares  some  public  and  private  information  about  x%  and  sends 
the  public  information  to  the  worker.  On  receiving  the  encoded  inputs  from  each  client,  W  computes  an 
encoded  output  and  sends  it  to  all  the  clients.  Note  that  the  clients  do  not  directly  interact  with  each  other 
in  this  phase.  This  ensures  that  clients  do  not  need  to  interact  (and  be  available  online)  whenever  they  wish 
to  perform  some  computation. 

Offline  Phase:  On  receiving  the  encoded  output  from  the  worker,  the  clients  interact  with  each  other  to  compute 
the  actual  output  F(x  1,2:2)  and  verify  its  correctness.  We  require  that  the  computation  of  each  client  in 
this  phase  be  independent  of  F.  As  we  see,  we  will  also  minimize  the  interaction  between  the  clients  in 
this  phase. 

Note  that  the  above  protocol  allows  only  for  a  single  round  of  interaction  between  the  clients  and  the  worker. 
We  require  that  the  computation  time  of  the  clients  in  steps  2  and  3  above  be,  in  total,  less  than  the  time  required 
to  compute  F  from  scratch  (in  fact,  in  our  protocol  the  computational  complexity  will  be  independent  of  F). 
Furthermore,  we  would  like  the  worker  to  perform  computation  that  is  roughly  the  same  as  computing  F. 

'in  fact,  this  is  the  state  of  affairs  even  if  we  relax  the  security  requirement  to  only  output  correctness,  and  do  not  require  input  privacy. 


7 


11.  How  to  Delegate  Secure  Multiparty  Computation  to  the  Cloud 


This  completes  the  description  of  our  computation  and  communication  model.  We  now  formally  describe 
the  key  requirements  from  a  2- party  verifiable  computation  protocol.  Intuitively,  we  require  that  a  verifiable 
computation  protocol  be  both  efficient  and  secure,  in  the  sense  as  discussed  below.  We  formally  define  both  of 
these  requirements  below. 

Security.  To  formally  define  security,  we  turn  to  the  real/ideal  paradigm  for  secure  computation.  We  stress  that 
we  allow  for  an  adversary  that  may  corrupt  either  Di  or  Do  (or  none)  as  well  as  the  worker.  Since  we  consider 
the  case  of  dishonest  majority,  we  only  obtain  security  with  abort:  i.e.,  the  adversary  first  receives  the  function 
output,  and  then  chooses  whether  the  honest  parties  also  learn  the  output,  or  to  prematurely  abort.  Further,  we 
only  consider  static  adversaries,  who  choose  the  parties  they  wish  to  corrupt  at  the  beginning  of  the  protocol. 
Finally,  we  consider  computational  security,  and  thus  we  restrict  our  attention  to  PPT  adversaries.  We  formally 
describe  the  ideal  and  real  models  of  computation  and  then  give  our  security  definition  in  Appendix  B. 


Efficiency.  Let  the  time  required  to  compute  function  F  be  denoted  by  tj?\  we  say  that  the  time  complexity  of 
F  is  £f. 

Definition  1  We  say  that  a  verifiable  computation  protocol  for  computing  a  function  F  is  efficient  if  it  satisfies 
the  following  conditions: 

-  The  running  time  of  every  client  in  the  pre-processing  phase  is  0(tp). 

-  The  running  time  of  every  client  in  the  online  phase  is  o(tp). 

-  The  running  time  of  the  worker  in  the  online  phase  is  0(tp). 

-  The  running  time  of  every  client  in  the  offline  phase  is  o(tp). 

2.2  Building  Blocks 

In  our  construction,  we  make  use  of  several  cryptographic  primitives,  listed  as  follows.  We  require  pseudo¬ 
random  functions,  statistically  binding  commitments,  fully  homomorphic  encryption,  multi-key  fully  homomor¬ 
phic  encryption,  the  single  client  verifiable  computation  protocol  from  [CKV 10]  and  a  standard  secure  computa¬ 
tion  protocol.  Due  to  lack  of  space,  we  describe  these  building  blocks  in  detail  in  Appendix  C. 

3  Two-party  Verifiable  Computation  Protocol 

We  now  describe  our  two-party  verifiable  computation  protocol  II  for  securely  computing  a  functionality  F. 
We  proceed  in  two  steps.  First,  in  Section  3.1,  we  describe  a  one-time  verifiable  computation  protocol  where 
the  pre-processing  stage  is  useful  only  for  one  computation.  Then,  in  Section  3.2,  we  show  how  our  one-time 
construction  can  be  modified  to  allow  for  multiple  uses  of  the  pre-processing  phase. 

3.1  One-Time  Verifiable  Computation  Protocol 

Let  Di,  D2  denote  the  two  clients  (or  delegators)  and  W  denote  the  worker. 

Function  outsourced  to  W.  In  order  to  securely  compute  function  F  over  their  private  inputs,  Di  and  Do  outsource 
the  computation  of  the  following  function  Q  to  W: 

Inputs:  X\,  A' 2,  where  Xi  <-  Encp/.(xJ). 

Output:  g(X i,X2)  =  Eva\pk(X1,X2,F),Xl,X2. 

We  now  proceed  to  describe  the  three  phases  of  our  protocol  II. 


8 


11.  How  to  Delegate  Secure  Multiparty  Computation  to  the  Cloud 


I.  Pre-processing  phase.  In  this  phase,  the  clients  interact  with  each  other  to  perform  the  following  computa¬ 
tions: 

1.  Di  and  Do  engage  in  the  execution  of  a  standard  secure  computation  protocol  Ilfhe  to  compute  the  (ran¬ 
domized)  functionality  Ffhe  described  as  follows: 

•  Generate  key  pairs  ( sk,pk )  G-  Gen(lK)  and  ( SK,PI\ )  g-  Gen(lK)  for  the  FHE  scheme 
(Gen,  Enc,  Dec). 

•  Compute  2-out-of-2  shares  of  the  FHE  secret  keys  sk,  SK.  That  is,  compute  ski,  sk2  s.t.  sk\  ®sk2  = 
sk,  and  SK\,  SK2  s.t.  SK\  ©  SK2  =  SK. 

•  Output  ( pk ,  PK,  ski,  SKi )  to  D,. 

2.  Di  and  D2  engage  in  the  execution  of  a  standard  secure  computation  protocol  nprf  to  compute  the  (ran¬ 
domized)  functionality  Fprf  described  as  follows: 

•  Sample  keys  K\  and  I\2  for  a  pseudo-random  function  PRF  :  {0, 1}K  x  {0, 1}K  —>  {0, 1}. 

•  For  every  j  G  [2],  compute  (cF\dp^)  G-  COm(/G). 

•  Output  ({c^rf},  dfrf,  I\i)  to  D,. 

3.  Di  and  D2  engage  in  the  execution  of  a  standard  secure  computation  protocol  ntest  to  compute  the  (ran¬ 
domized)  functionality  Ftest  described  as  follows.  Ftest  takes  as  input  the  public  key  pk  for  FHE  (as 
computed  above)  from  Di,  D2  and  computes  the  following: 

•  For  every  i  G  [2 \,j  G  [n],  generate  random  strings  r?J  and  compute  Rij  G-  Enc/>/<'(Enc;;/.(r);  ;)). 

•  For  every  j  G  [n], 

(a)  Compute  secret,  =  Eval/>^(/(i?,  R2j',  G). 

(b)  Compute  (c^est,  d*est)  G-  COM(secret). 

(c)  Choose  random  strings  d\e!^,  d^f  s-1-  d*est  =  d^if  ©  d^f. 

•  Output  (Rij,cfst,  d^f)  to  D,. 

(Note  that  the  three  steps  above  can  be  combined  into  a  single  secure  computation  protocol  execution.  We  choose 
to  split  them  into  separate  executions  for  simplicity  of  explanation  and  proof.) 

II.  Online  phase.  In  this  phase,  the  clients  interact  with  the  worker  in  a  single  round  of  communication  to 
compute  the  functionality  Q.  For  simplicity  of  exposition,  we  assume  that  the  public  keys  (pk,  PK)  were  given 
to  W  at  the  end  of  the  pre-processing  phase;  we  do  not  include  them  in  the  description  below. 

More  specifically,  this  phase  proceeds  as  follows: 

D,  — r  W:  Let  xt  denote  the  private  input  of  D,.  The  client  D,  performs  the  following  steps: 

1.  For  every  j  G  [n], 

•  Compute  g-  Encpx(Encpfc(xj)). 

•  Let  s  be  the  session  number.  Then,  compute  bit  bt,3  < —  prf^  (s||j).  Let  (v[P ,  vj  ■ )  be  such  that 

'•©  -V^andc/''  K,. 

D/.  sends  the  tuple  {vfj ,  v)  ■  }™=  x  to  W. 

W  — »•  (Di,  D2):  On  receiving  the  tuples  {u°  ■,  vj  -}”=1  from  each  client  D,.  W  performs  the  following  steps.  For 
every  j  G  [n],  homomorphically  compute  the  following  four  values: 
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1.  z?  =  Eval PK(v%j,v%>:j;E\/a\pk(-,-,g)). 

2.  z)  =  Evalpx^ij,^;  Evalpfc(->'!^))- 

3.  zj  =  Evalpx^-,^;  Evalpfc(-?  ■,  £)). 

4.  z|  =  Evalpx^j,^-;  Evalpfc(-,  ■,  £)). 

W  sends  the  tuples  (vY ,  v\  ■ ,  zj)  to  both  the  clients,  where  {0,...,3},  *  €  [2],  j  G  [n], 

III.  Offline  Phase.  In  this  phase,  the  clients  interact  with  each  other  to  decode  the  output  from  the  worker  and 
verify  its  correctness.  More  specifically,  Di  and  D2  engage  in  an  execution  of  a  standard  secure  computation 
protocol  nver  to  compute  the  functionality  Fver  described  below. 

Functionality  Fver.  The  input  of  client  D  j  to  Fver  is  the  set  of  following  values: 

Public  values:  s,  pk,  PK,  c^st,  cprf,  (vY,vY),  zj,  where  i  G  [2],j  €  [n],  £  G  {0, . . . ,  3}. 

Private  values:  Xj,  ptJ,  ski,  SKi,  d J®st,  dprf,  where  j  G  [n]. 

On  receiving  the  above  set  of  inputs  from  Di  and  D2,  Fve r  computes  the  following: 

1.  Match  all  the  public  input  values  of  Di  and  D2.  If  any  of  the  corresponding  values  arc  not  equal,  output  . 

2.  For  every  i  G  [2],  if  _L  «—  OPEN(cprf,  ciprf ),  then  output  _L.  Otherwise,  let  K,  -t—  OPEN(cprf,  dprf). 

3.  For  every  j  G  [n],  do  the  following: 

(a)  If  _L  ^ —  OPEN(cjest,  dff  ©  d?2.f  ),  then  output  _L.  Otherwise,  let  secret*  =  OPEN(cjest,  dffj  ®  dlff  )- 

(b)  For  every  i  G  [2],  compute  bij  =  PRF^sHj).  Let  pj  =  2b±j  +  62, j  +  1.  If  secret*  f  zj  P] ,  then 
output  ±. 

(c)  Compute  (Y)[0],  Yj[l],  Yj[ 2])  •*—  DecsK1®SK2(z^J)-  For  anY  *  C  [2],  do  the  following: 

•  If  Yj[i\  f  DecsK,(BSK2(r4‘f  ),  then  output  _L. 

•  If  Y}  pi]  f  Encpfc(.Tj;  Pij),  then  output  _L. 

4.  Output  y  =  Decsfcl0sfc2(Yi[O]). 

This  completes  the  description  of  our  protocol.  We  now  claim  the  following: 

Theorem  1  Assuming  the  existence  of  a  fully  homomorphic  encryption  scheme,  protocol  II  is  a  secure  and 
efficient  one-time  verifiable  computation  protocol  for  any  efficiently  computable  functionality  f. 


It  follows  from  observation  that  protocol  II  is  an  efficient  verifiable  computation  protocol  as  per  Definition  1. 
In  Section  4,  we  prove  its  security  as  per  Definition  2. 

3.2  Many-time  Verifiable  Computation 

We  now  explain  how  our  one-time  verifiable  computation  protocol  II  described  in  previous  subsection  can  be 
extended  to  allow  for  multiple  uses  of  the  pre-processing  phase. 

Similar  to  [GGP10,  CKV10],  we  achieve  reusability  of  the  pre-processing  phase  by  executing  the  online 
phase  under  an  additional  layer  of  FHE  (and  performing  necessary  decryptions  in  the  offline  phase).  However, 
since  we  wish  to  avoid  interaction  between  the  clients  during  the  online  phase,  a  priori,  it  is  not  clear  how  the 
new  public  key  must  be  chosen  during  each  execution  of  the  online  phase.  We  resolve  this  issue  by  making  use 
of  a  multi-key  homomorphic  encryption  scheme  (MFHE)  (see  Section  C.3  for  definition).  More  specifically,  we 
make  the  following  changes  to  our  one-time  verifiable  computation  protocol: 
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Online  Phase: 

1.  Each  client  D,  generates  a  key  pair  (M P I\n  MSKj)  <—  MGen(lK)  for  the  MFHE  scheme.  For  every 
j  G  [n],  it  computes  Xtj  -t—  MEnc MPKj(Xij)  and  Rij  M E n c m p k,  ( R% .j),  where  Xtj  and  Rij  arc  as 
defined  earlier.  The  values  sent  to  W  arc  (v^  ■  and  vjj),  where  v1-1-1  =  X^j  and  bl’3  =  Rij. 

2.  The  worker  W  now  computes  the  following  values:  for  every  j  G  [n], 

(a)  =  M Eva Impki , mpjc2 ^2 j)  Evalp#(-,  •,  Evalpfc(-,  •,  Q))). 

(b)  z)  =  MEva\MPK1,MPK2{vi,j,vlj,  Evalp#(-,  Evalpfc(-,  -,G)))- 

(c)  zj  =  MEva\MPK1,MPK2(v{j,V2j,  EvalpA'(-,  Evalpfc(-,  •,  G))). 

(d)  z'j  =  ME\za\MPK1,MPK2{vlj,vlj,  Eva\PK(-,  Evalpfc(-,  Q))). 

Offline  Phase: 

1.  Each  client  D,  uses  (MPK\ ,  MPK2)  and  MSKj  as  additional  inputs  in  the  protocol  nver. 

2.  For  every  j  G  [n],  the  functionality  Fver  computes  the  values  -t—  M  DecAfSp"!  ,msk2  )  and  ^  Pj  -t— 

M DecAfSp-,  ,msk2  (Zj  P:I  )>  and  then  performs  the  same  computations  as  described  earlier. 

This  completes  the  description  of  the  many-time  verifiable  computation  protocol.  We  discuss  its  security  in 
Section  D.2. 

4  Proof  of  Security 

In  order  to  prove  Theorem  1 ,  we  will  first  a  construct  a  PPT  simulator  S  that  simulates  the  view  of  any  adversary 
A  who  corrupts  one  of  the  clients  and  the  worker.  We  will  then  argue  that  the  output  distributions  in  the  real 
and  ideal  world  executions  arc  computationally  indistinguishable.  We  staid  by  describing  the  construction  of  S 
in  Section  4.1.  We  complete  the  proof  by  arguing  the  correctness  of  simulation  in  Appendix  D.  1. 

4.1  Description  of  Simulator 

Without  loss  of  generality,  below  we  assume  that  the  client  D2  and  the  worker  W  arc  corrupted  by  the  adversary. 
We  will  denote  them  as  D|  and  W*,  respectively.  The  opposite  case  where  Di  and  W  arc  corrupted  can  be 
handled  analogously. 

We  describe  how  the  simulator  S  works  in  each  of  the  three  phases: 

Pre-processing  phase: 

1.  Let  Sfhe  denote  the  simulator  for  the  two-party  computation  protocol  Ilfhe-  In  the  first  step,  S  runs  5fhe 
with  D|  to  generate  a  simulated  execution  of  IIf(ie.  During  the  simulation,  when  Sfhe  makes  a  query  to  the 
ideal  functionality  Ffhe,  S  evaluates  Tfhe  on  its  own  using  fresh  randomness  and  returns  the  output  to  .Sfhe- 
Let  pk ,  PK  denote  the  output  public  keys  in  this  protocol. 

2.  Let  Sprf  denote  the  simulator  for  the  two-party  computation  protocol  nprf.  In  the  second  step,  S  runs  ,Sprf 
with  D2  to  generate  a  simulated  execution  of  Ilfhe-  During  the  simulation,  when  Sprf  makes  a  query  to 
the  ideal  functionality  Ffhe,  S  evaluates  Fprf  on  its  own  using  fresh  randomness  and  returns  the  output  to 
Sprf.  Let  ( { Cj rf } .  d?rf,  Kt )  denote  the  output  of  D,,  where  each  variable  is  defined  in  the  same  way  as  in 
the  protocol  description  in  the  previous  section. 

3.  Let  Sfest  denote  the  simulator  for  the  two-party  computation  protocol  IItest.  In  the  final  step  of  the  pre¬ 
processing  phase,  S  runs  Stest  with  D2  to  generate  a  simulated  execution  of  IItest.  During  the  simulation, 
when  Stest  makes  a  query  to  the  ideal  functionality  Ftest,  S  evaluates  Ftest  on  its  own  using  fresh  random¬ 
ness  and  returns  the  output  to  Stest-  Let  (Rij,  c^est .  d]eJl )  denote  the  output  of  D,,  where  each  variable  is 
defined  in  the  same  way  as  in  the  protocol  description  in  the  previous  section. 
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Online  Phase.  In  this  phase,  S  works  in  essentially  the  same  manner  as  the  honest  client  Di,  except  that  it  uses 
the  all  zeros  string  ()K  as  its  input  and  uses  random  bits  bij  (instead  of  pseudorandom  bits). 

S  behaves  honestly  for  the  rest  of  the  online  phase.  Let  (vfj ,  v\  • ,  Zj )  be  the  tuple  S  receives  from  W*,  where 
i  G  {0, . . . ,  3},  i  G  [2],  j  G  [n]. 

Offline  Phase.  Let  ,S'ver  denote  the  simulator  for  the  two-party  computation  protocol  nver.  In  the  offline  phase, 
S  runs  the  simulator  5ver  with  the  corrupted  client  Drj  to  generate  a  simulated  execution  of  nver.  At  some  point 
during  the  simulation,  ,S'ver  makes  a  query  to  the  ideal  functionality  Fver  with  some  input  (say)  Z,  S  does  the 
following: 

1.  Parse  Z  as  follows: 

Public  values:  s,  pk,  PK ,  cj®st,  c?rf,  (vfj,  vjj),  z*,  where  i  G  [2  \,j  G  [n],£  G  {0,...,3}. 

Private  values:  X2,  p2,j,  62 ,j,  sk2,  SK2,  d^f,  d^rf ,  where  j  G  [n]. 

2.  Perform  the  verification  checks  in  the  same  manner  manner  as  Fver,  except  the  check  regarding  the  PRF 
key  K\ .  (In  particular,  use  the  bits  b\  j  directly  computed  for  the  checks.  If  any  of  the  checks  fail,  then 
output  ±  to  iS’ver-  Else,  if  all  of  the  checks  succeed,  then  query  the  ideal  functionality  for  F  on  input  %2- 
Let  y  be  the  output.  Return  y  as  the  output  to  Sver. 

On  receiving  the  output  value  y  from  S,  Sver  continues  the  simulation  of  nver  where  it  forces  the  output  y  on  A, 
and  finally  stops.  At  this  point,  S  stops  as  well  and  outputs  the  entire  simulated  view  of  A. 

5  Extensions 

5.1  Handling  Multiple  Clients 

In  this  section,  we  shall  show  how  we  can  extend  our  two-party  verifiable  computation  protocol  to  obtain  an 
n-party  verifiable  computation  that  is  secure  against  a  constant  fraction  (a)  of  corrupted  clients  (who  can  collude 
arbitrarily  with  a  corrupted  server). 

Let  us  first  recall  a  basic  idea  that  we  used  in  our  protocol  for  two-party  verifiable  computation.  The  two 
clients  Di  and  D2  picked  random  strings  r  1  and  r 2  in  the  pre-processing  phase  and  computed  an  encryption 
of  G(r  1 ,  ?'2 ) -  In  the  online  phase,  Di  picked  a  random  bit  61  and  sent  either  (xi,ri)  or  (ri,xi),  both  doubly 
encrypted,  depending  on  the  bit  b\.  D2  picked  a  random  62  and  sent  either  (x2 ,  '('2 )  or  (V9 ,  x'2 ) ,  both  doubly 
encrypted,  depending  on  the  bit  62.  Let  the  ciphertexts  sent  by  Di  be  denoted  by  and  those  sent  by 

D2  be  denoted  by  (v^ ,  v\ )  respectively.  The  server  W,  then  responded  with  4  ciphertexts,  computed  homo- 
morphically:  namely,  z°  =  MEvalPKltPK2(v^,v^;Evalpk(-,-,G)),  z1  =  MEva\PKl,PK2(vi,v%;Eva\pk(-, 
z2  =  MEvalPK1,PK2(vi,v2>  Evalph(-,'-;G)),  and  z3  =  MEva\PKl,PK2(vi,vbEva\pk(-,-,G))  (In  the  real  pro¬ 
tocol,  the  clients  actually  picked  k  such  random  strings  each  and  the  server  sent  back  4k  ciphertexts  evaluated 
homomorphic  ally  back  to  the  clients;  but  for  simplicity,  we  will  only  consider  one  such  instance  here.).  Note  that 
out  of  the  4  ciphertexts,  one  of  them  contained  an  encryption  of  G(ri,r2),  that  was  used  for  verification,  and  one 
of  them  contained  an  encryption  of  G(x  1,  X2),  that  was  used  to  obtain  the  result  of  the  computation. 

Now,  suppose  we  tried  to  extend  the  above  idea  as  it  is,  to  the  n-party  case.  That  is,  the  n  clients  each  pick 
n,  • .  rn  in  the  pre-processing  phase  and  compute  G !  n .  • ,rn ).  In  the  online  phase,  each  client  D;:  picks  a  private 
random  bit  b-t  and  sends  either  (xt,  n)  or  (rt.  ay),  both  doubly  encrypted,  depending  on  the  bit  br.  Unfortunately 
now,  since  the  server  does  not  (and  cannot)  know  the  bits  6*.  the  server  must  send  back  2n  ciphertexts  in  order  to 
be  sure  that  he  has  sent  back  both  an  encryption  of  G(r  1,  •  •  •  ,  rn)  (to  be  used  for  verification  by  the  n  clients)  as 
well  as  encryption  of  G(%i,  •  •  •  ,  xn)  (to  be  used  to  obtain  the  result  of  the  computation  by  the  n  clients).  This 
solution  ends  up  having  a  non-polynomial  time  complexity  for  all  part ics. 

Our  idea  to  solve  the  above  problem  is  to  have  the  n  clients  jointly  generate  random  bits  b\ ,  •  •  •  ,  bn  such  that 
exactly  I  b,  =  1  (where  i  is  random  in  [n])  and  all  other  bj  =  0,  j  /  i.  Now,  the  server  only  needs  to  compute 
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2 n  ciphertexts  in  order  to  be  sure  that  he  has  returned  the  required  ciphertexts.  In  other  words,  W  computes  the 
following  2 n  sets  of  ciphertexts: 

1.  The  n  ciphertexts,  one  of  which  encrypts  Q(x i,  •  •  •  ,xn),  namely: 

(a)  z°  =  MEva\PKl,PK2(vi,V2,---  ,v°;Evalpk(-r.,‘--  ,.*>£))> 

(b)  z1  =  MEval  pKi,pk2(vi,v%,v$,-- ■  ,  v°;  Evalpk(-, 

(c)  zn  =  MEvalPKl,PK2(v^---  ,n°_1,^;Evalpfc(-,---  ,  ■,-,£)) 

2.  and  the  n  ciphertexts,  one  of  which  encrypts  Q(r i,  •  •  •  ,  rn),  namely: 

(a)  zn+1  =  MEva\PKl!PK2(vi,vl,-  ■  ■  ,v\\  Evalpfc(-,-,--- >.-,<?)), 

(b)  zn+2  =  MEval PKiipk2(vI,v%,vI,---  , Evalpfc(-, 

(c)  z2n  =  MEva\pKupK2(y{,  -  ■  ■  ,  v°n]  Evalpfe(-,  •  •  •  ,-,■,£)) 

The  above  idea  ensures  that  the  complexity  of  the  worker  remains  polynomial  (the  complexity  of  the  clients 
arc  still  independent  of  F  except  for  the  pre-processing  phase).  Two  (linked)  issues  remain  to  be  addressed:  1) 
How  do  the  clients  generate  the  bits  b\ ,  •  -  •  ,  bn  with  the  required  distribution  without  interacting  with  each  other? 
and  2)  the  security  of  the  protocol.  These  issues  arc  addressed  in  Appendix  E. 

5.2  Minimizing  Interaction  in  Offline  Phase 

Recall  that  in  the  offline  phase,  the  clients  need  to  execute  a  secure  computation  protocol  in  order  to  verify  and 
obtain  the  output  of  the  computation.  Note  that  since  we  work  in  the  pre-processing  model,  we  can  use  a  specific 
multi-party  computation  protocol  in  order  to  reduce  the  round  complexity  of  the  clients  in  this  phase.  More 
specifically,  we  can  use  any  secure  computation  protocol,  even  one  that  makes  use  of  pre-processing,  so  long  as 
this  pre-processing  is  re-usable  for  multiple  runs  of  the  protocol.  Such  a  protocol  exists  due  to  the  construction 
of  Asharov  el  al.  [AJLA+12],  which  is  a  2-round  secure  computation  protocol  in  the  re-usable  pre-processing 
model  with  CRS  (note  that  in  our  case,  the  clients  can  compute  the  CRS  needed  for  this  protocol  during  the  initial 
pre-processing  phase).  Using  this  protocol,  we  can  obtain  a  multi-party  verifiable  computation  protocol  in  which 
the  round  complexity  of  the  clients  in  the  offline  phase  is  2. 
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A  Related  works 

Interactive  Solutions.  Goldwasser  et  al.  [GKR08]  show  how  to  build  an  interactive  proof  between  a  client  and 
a  server  to  verify  arbitrary  polynomial  time  computations  in  almost  linear  time.  Because  of  the  interactive  nature, 
this  protocol  is  not  suited  to  the  multi-client  case  (as  we  discussed  above  this  would  require  the  clients  to  be  all 
present  and  interacting  with  the  server  during  the  computation  -  our  model  enforces  a  single  message  exchanged 
between  server  and  client  during  the  online  phase)2. 

SNARGs.  A  class  of  solutions  is  based  on  succint  non-interactive  arguments  (or  SNARGs):  (computationally 
sound  [BCC88])  proofs  that  are  very  short  and  very  efficient  to  verify,  regardless  of  the  complexity  of  the  function 
being  evaluated.  Solutions  of  this  type  arc  usually  constructed  using  Probabilistically  Checkable  Proofs  (PCPs), 
long  proofs  that  the  verifier  can  check  in  only  very  few  places  (in  particular  only  a  constant  number  of  bits 
of  the  proofs  arc  needed  for  NP  languages).  Kilian  [Kil92]  showed  how  to  use  PCPs  to  construct  interactive 
succint  arguments  by  committing  to  the  entire  PCP  string  using  a  Merkle  tree.  Micali  [Mic94]  removed  the 
interaction  by  use  of  a  random  oracle.  Recent  work  [BCCT12,  GLR1 1,  DFH12]  has  replaced  the  random  oracle 
with  an  “extractable  collision-resistant  hash  functions”  (ECRHs),  a  non-falsifiable  [Nao03],  assumption  that  any 
algorithm  that  computes  an  image  of  the  ECRH  must  ’’know”  the  corresponding  pre-image. 

There  are  alternative  constructions  of  SNARGs  based  on  different  forms  of  arithmetization  of  Boolean  com¬ 
putations  used  together  with  cryptographic  constructions  based  on  bilinear  maps  (e.g.  [GrolO,  Lip  12,  GGPR12]). 
Those  protocols  also  rely  on  non-falsifiable  ’’knowledge”  assumptions  over  the  cryptographic  groups  used  in  the 

2  A  non-interactive  argument  for  a  restricted  class  of  functions  is  also  presented  in  [GKR08],  We  did  not  investigate  if  this  could  be 
turned  into  a  multi-client  non-interactive  protocol  (though  we  suspect  it  could,  when  coupled  with  an  FHE),  because  the  focus  of  this 
paper  is  a  general  solution  for  arbitrary  polynomial  computations. 
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constructions.  We  note  that  following  [AF07,  GW11]  such  “knowledge”  assumptions  seem  unavoidable  when 
using  SNARGs. 

As  we  pointed  out  in  the  previous  Section,  SNARGs  coupled  with  FHE  would  yield  a  conceptually  simple 
protocol  for  the  multi-client  case,  but  at  the  cost  of  using  either  the  random  oracle  or  a  falsifiable  assumption. 
Our  protocol  instead  relies  on  standard  cryptographic  assumptions  (in  particular  the  existence  of  FHE). 

Verifiable  Computation.  Gennaro,  Gentry  and  Parno  [GGP10]  and  subsequent  works  [CKV10,  AIK10] 
present  a  verifiable  computation  (VC)  scheme  that  allows  a  client  to  outsource  any  efficiently  computable  func¬ 
tion  to  a  worker  and  verify  the  worker’s  computation  in  constant  time,  where  the  worker’s  complexity  is  only 
linear  in  the  size  of  the  circuit,  assuming  a  one-time  expensive  preprocessing  phase  depending  on  the  function 
being  outsourced.  We  use  their  approach,  and  most  specifically  we  use  the  protocol  from  [CKV10]  as  our  start¬ 
ing  point.  [PRV12]  show  how  to  construct  a  protocol  for  outsourcing  computation  of  a  smaller  class  of  functions 
stalling  from  any  attribute -based  encryption  scheme.  Their  solution  does  not  handle  input  privacy  however. 

Server-Aided  Secure  Function  Evaluation.  The  first  work  to  explicitly  consider  outsourced  computation  in 
the  multi-client  case  is  by  Kamara  et  al  [KMR11]  (see  also  [KMR12]  which  reports  on  some  optimizations  of 
[KMR1 1]  and  implementation  results).  The  main  limitation  of  these  works  is  a  non-colluding  adversarial  model 
where  the  server  and  the  clients  may  maliciously  depart  from  the  protocol  specifications  but  without  a  common 
strategy  or  communication.  A  simpler  protocol  is  also  presented  in  [CKKC12]  but  it  assumes  only  semi-honest 
parties.  We  stress  that  our  work  is  the  first  to  achieve  full  simulation  security  in  the  most  stringent  adversarial 
model. 

B  Security  definition 

We  now  formally  describe  the  ideal  and  real  models  of  computation  and  then  give  our  security  definition. 

Ideal  World.  In  the  ideal  world,  there  is  a  trusted  party  T  that  computes  the  desired  functionality  F  on 
the  inputs  of  all  parties.  Unlike  the  standard  ideal  world  model,  here  we  allow  for  multiple  evaluations  of  the 
functionality  on  different  sets  of  inputs.  An  execution  of  the  ideal  world  consists  of  (unbounded)  polynomial 
number  of  repetitions  of  the  following: 

Inputs:  Di  and  D2  have  inputs  x \  and  x2  respectively.  The  worker  has  no  input.  All  parties  send  their  inputs  to 
the  trusted  party  T.  Additionally,  corrupted  part  ies  may  change  their  inputs  before  sending  them  to  T. 

Trusted  party  computes  output:  T  computes  F(xi,  xfi)- 

Adversary  learns  output:  T  returns  F(x  i,x2)  to  A. 

Honest  parties  learn  output:  A  prepares  a  list  of  (possibly  empty)  set  of  honest  clients  that  should  get  the 
output  and  sends  it  to  T.  T  sends  F{x\,  X2)  to  this  set  of  honest  clients  and  _L  to  other  honest  clients  in  the 
system. 

Outputs:  All  honest  parties  output  whatever  T  gives  them.  Corrupted  parties,  wlog,  output  _L.  The  view  of  A 
in  the  ideal  world  execution  above  includes  the  inputs  of  corrupt  parties,  the  outputs  of  all  parties,  as  well 
as  the  entire  view  of  all  corrupt  parties  in  the  system.  A  can  output  any  arbitrary  function  of  its  view  and 
we  denote  the  random  variable  consisting  of  this  output,  along  with  the  outputs  of  all  honest  parties,  by 

IdealF)a(xi,x2). 

Real  World.  In  the  real  world,  there  is  no  trusted  party  and  the  part  ies  interact  directly  with  each  other 
according  to  a  protocol  n.  Honest  parties  follow  all  instructions  of  n,  while  adversarial  parties  are  coordinated 
by  a  single  adversary  A  and  may  behave  arbitrarily.  At  the  conclusion  of  the  protocol,  honest  clients  compute 
their  output  as  prescribed  by  the  protocol. 
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For  any  set  of  adversarial  parties  (that  may  include  a  corrupt  worker  and  a  corrupt  Di  or  D2)  controlled  by  A 
and  protocol  II  for  computing  function  F,  we  let  Real^(xi,  X2)  be  the  random  variable  denoting  the  output 
of  A  in  the  real  world  execution  above,  along  with  the  output  of  the  honest  part ics.  Real^^i,  x2)  can  be  an 
arbitrary  function  of  the  view  of  A  that  consists  of  the  inputs  (and  random  tape)  of  corrupt  parties,  the  outputs  of 
all  parties  in  the  protocol,  as  well  as  the  entire  view  of  all  corrupt  parties  in  the  system. 

Security  Definition.  Intuitively,  we  require  that  for  every  adversary  in  the  real  world,  there  exists  an  adversary 
in  the  ideal  world,  such  that  the  views  of  these  two  adversaries  arc  computationally  indistinguishable.  Formally, 

Definition  2  Let  F  and  II  be  as  above.  We  say  that  II  is  a  secure  verifiable  computation  protocol  for  computing 
F  if  for  every  PPT  adversary  A  that  corrupts  either  Di  or  D2,  and  additionally  possibly  corrupts  the  worker,  in 
the  real  model,  there  exists  a  PPT  adversary  S  (that  corrupts  the  same  set  of  parties  as  A)  in  the  ideal  execution, 
such  that: 

Q 

IDEALF^(xi,X2)  =  REAL7r^(.Ti,.T2) 


C  Building  Blocks 

We  now  describe  the  building  blocks  we  require  in  our  protocol. 

C.l  Statistically  Binding  Commitments 

We  shall  make  use  of  statistically  binding  commitments  in  our  protocol.  A  statistically  binding  commitment 
consists  of  two  probabilistic  algorithms:  COM  and  OPEN.  COM  takes  as  input  a  message  m  from  the  sender  and 
outputs  a  “commitment”  to  m,  denoted  by  c  to  the  receiver  and  a  “decommitment”  to  m,  denoted  by  d,  to  the 
sender.  OPEN  takes  as  input  c  and  d  and  outputs  a  message  m  (denoting  the  message  that  was  committed  to) 
or  outputs  _L  denoting  reject.  The  correctness  property  of  a  commitment  scheme  requires  that  for  all  honestly 
executed  COM  and  OPEN,  we  have  that  OPEN (COM  (m))  =  m,  except  with  negligible  probability.  Informally,  a 
statistically  binding  commitment  scheme  has  the  security  property  that  no  (computationally  unbounded)  sender 
can  commit  to  a  message  m  and  have  it  decommit  to  some  other  message.  In  other  words,  no  computationally 
unbounded  sender  can  come  up  with  a  commitment  c  and  two  decommitments  d  and  d'  such  that  open(c,  d)  =  m 
and  OPEN (c,  d!)  =  m'  for  different  m  and  m' .  In  our  protocol,  we  shall  make  use  of  Naor’s  two-round  statistically 
binding  commitment  scheme  [Nao89].  At  a  high  level,  the  commitment  scheme,  based  on  any  pseudorandom 
generator,  G,  from  k  bits  to  3 k  bits,  works  as  follows:  in  the  commitment  phase,  the  receiver  sends  a  random  3k 
bit  string,  r  to  the  sender.  The  sender  picks  a  seed  s  (of  length  k)  to  the  pseudorandom  generator  at  random  and 
sends  G(s)  to  commit  to  0  and  G(s)  ©  r  to  commit  to  1.  The  decommitment  is  simply  the  bit  and  the  seed  s. 
This  scheme  is  statistically  binding  and  computationally  hiding. 

C.2  Fully  homomorphic  Encryption 

In  our  constructions,  we  shall  make  use  of  a  fully  homomorphic  encryption  (FHE)  scheme  [Gen09,  BV11, 
BGV12].  An  FHE  scheme  consists  of  four  algorithms:  (a)  a  key  generation  algorithm  Gen(  F'  )  that  takes  as 
input  the  security  parameter  and  outputs  a  public  key/secret  key  pair  (/>/;:,  sk),  (b)  a  randomized  encryption  algo¬ 
rithm  EnCpfc(m)  that  takes  as  input  the  public  key  and  a  message  m  and  produces  ciphertext  c,  (c)  a  decryption 
algorithm  Dec.,/, .(c)  that  takes  as  input  the  secret  key,  ciphertext  c  and  produces  a  message  m,  and  (d)  a  determin¬ 
istic 3  evaluation  algorithm  Evalp/,(c,  F)  that  takes  as  input  a  ciphertext  c  (that  encrypts  a  message  m),  the  public 
key,  and  (the  circuit  description  of)  a  PPT  function  F  and  produces  a  ciphertext  c* . 

The  correctness  of  the  encryption,  decryption,  and  evaluation  algorithms  require  that  for  all  key 
pairs  output  by  Gen,  Decsfc(Encpfc(ra))  =  m,  for  all  m  (except  with  negligible  probability)  and  that 
Decsfc(Evalpfc(Encp/c(m),  F))  =  F(m),  for  all  m  and  PPT  F,  (except  with  negligible  probability).  The  com¬ 
pactness  property  of  an  FHE  scheme  requires  the  following:  let  c*  •(—  Eval;,/,(c,  F).  There  exists  a  polynomial 

3The  Eval  algorithm  need  not  be  deterministic  in  general,  but  we  require  that  the  algorithm  be  deterministic.  There  are  plenty  of  such 
schemes  available  based  on  a  variety  of  assumptions. 
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P  such  that  |c*|  <  P(k).  (In  other  words,  the  size  of  c*  is  independent  of  the  size  of  circuit  description  of  F .) 
Finally,  the  security  definition  for  an  FHE  scheme  is  that  of  standard  semantic  security  for  encryption  schemes. 

We  note  that  FHE  schemes  are  known  to  exist  from  a  variety  of  cryptographic  assumptions  such  as  the 
learning  with  errors  (LWE)  assumption  [Reg05]. 

C.3  Multikey  Fully  Homomorphic  Encryption 

In  our  construction,  we  will  use  a  multikey  fully  homomorphic  encryption  (MFHE)  scheme  [LTV  12]  to  avoid  in¬ 
teraction  between  the  players  during  the  online  phase  of  our  protocol.  An  MFHE  scheme  is  defined  as  a  four-tuple 
of  algorithms  (M  Gen,  MEnc,  M  Eva  I,  M  Dec):  (a)  a  key  generation  algorithm  (MPK,  MSK)  -t—  MGen(lK) 
that  takes  as  input  the  security  parameter  and  outputs  a  public  key/secret  key  pair  (MPK,  MSK),  (b)  a  ran¬ 
domized  encryption  algorithm  c  <—  MEnc MPK(m)  that  takes  as  input  the  public  key  and  a  message  m  and 
produces  a  ciphertext  c,  (c)  a  decryption  algorithm  m  <—  MDecMSK1,...,MSKn(c)  that  takes  as  input  n  se¬ 
cret  keys  MSKi  and  a  ciphertext  c,  and  outputs  a  message  m,  and  (d)  a  deterministic  evaluation  algorithm 
c*  MEya\MPK1,...,MPKn (ci, . . . ,  cn,  C )  that  takes  as  a  input  (the  circuit  description  of)  a  PPT  function  F,  a 
list  of  ciphertexts  c\ ,cn  along  with  the  corresponding  public  keys  MPK\, . . . ,  MPI\n,  and  produces  a  new 
ciphertext  c*. 

An  MFHE  scheme  must  satisfy  the  following  two  requirements:(a)  Correctness:  For  every  c* 

MEva\MPKu...,MPKn(ci,  ...,cn,  F),  where  ct  -e-  MEnc  MPKiim),  it  must  be  that  MDecMSKu...,MSKn(c*)  = 
C(rrii, . . . ,  mt).  (b)  Compactness:  Let  c*  M  E  va  I  atp  Ay , . . . ,  m  p  Kn  (c  1 , . . . ,  cn,  F ).  Then,  there  exists  a  polyno¬ 
mial  P  such  that  |c*|  <  P(k,  n). 

The  security  definition  for  an  MFHE  scheme  is  that  of  standard  semantic  security  for  encryption  schemes.  We 
remark  that  in  the  above  description,  for  simplicity  of  notation,  we  do  not  explictly  mention  “evaluation  keys”, 
and  simply  assume  that  they  arc  paid  of  the  public  keys. 

We  note  that  an  MFHE  scheme  was  recently  constructed  by  Lopez-Alt  et  al.  [LTV12]  based  on  NTRU 
[HPS98], 

C.4  Single-Client  Verifiable  Computation 

As  a  building  block  for  our  solution  we  use  the  recently  proposed  method  for  single-client  verifiable  computations 
[CKV10].  For  concreteness,  we  briefly  describe  the  solution  in  [CKV10]  which  can  be  based  on  any  Fully 
Homomorphic  Encryption  (FHE)  scheme  [Gen09].  The  high  level  idea  for  their  protocol  to  outsource  a  function 
F  is  as  follows.  The  client  picks  a  random  r  and  computes  F  (r)  in  the  preprocessing  phase.  Next,  in  the  online 
phase,  after  receiving  the  input  x,  the  client  picks  a  random  bit  b  and  sends  either  (x,  r)  or  (r,  x)  to  the  server 
(depending  on  the  bit  b).  The  server  must  compute  F  on  both  x  and  r  and  return  the  responses  back  to  the 
client.  The  client  will  check  that  F(r)  matches  the  pre-computed  value  and  if  so  accept  the  other  response  as  the 
correct  F(x).  Now,  suppose  x  comes  from  the  uniform  distribution,  then  this  protocol  is  a  sound  protocol  and  a 
cheating  server  can  succeed  only  with  probability  ^  (as  he  cannot  distinguish  (x,  r )  from  (r,  x)  with  probability 
better  than  ^).  For  arbitrary  distributions,  this  approach  fails,  but  this  can  be  rectified  by  having  the  client 
additionally  pick  a  public  key  for  an  FHE  scheme  (in  the  preprocessing  phase)  and  sending  (Encpfc(x),  Encp/.(r)) 
or  (Encpfc(?’),  Encp/.f.x)),  depending  on  bit  b  in  the  online  phase.  The  server  will  homomorphically  evaluate  the 
function  F  and  respond  back  with  Enc;,;.(F(.x))  and  Encp/.(F(r)).  Now,  this  protocol  is  sound  for  arbitrary 
distributions  of  x  (as  a  cheating  server  cannot  distinguish  ('Encp/-(.x'),  Encp/,:(r))  from  ( E n cp/ •  (r ) ,  E n cp/,.  ( x ) ) ) . 
One  can  boost  the  soundness  error  to  be  negligibly  small  by  picking  random  n,  ■  ■  ■  ,rK  and  having  the  client 
pick  ,bK  and  send  (Encp/,(.x),  Enc pk(ri))  (or  the  other  way  around,  depending  on  h,).  The  client  will 

check  that  all  values  of  Ffr,;)  were  correct  and  that  the  n  different  values  for  Fix)  were  identical  and  if  so, 
accept  F(x).  In  order  to  make  this  protocol  re-usable  with  the  same  values  of  r\,  ■  ■  ■  ,rK,  [CKV10]  need  to  run 
this  entire  protocol  under  one  more  layer  of  fully  homomorphic  Encryption. 
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C.5  Secure  Computation 

We  make  use  of  a  two-party  secure  computation  protocol  (between  part ics  Di  and  D2).  We  make  use  of  such  a 
protocol  that  is  secure  in  the  standard  ideal/real  world  paradigm. 

Ideal  World.  In  the  ideal  world,  there  is  a  trusted  party  T  that  computes  the  desired  functionality  F  on  the 
inputs  of  the  two  parties.  An  execution  of  the  ideal  world  consists  of  the  following: 

Inputs:  Di  and  D2  have  inputs  x\  and  X2  respectively  and  send  their  inputs  to  the  trusted  party  T.  Additionally, 
a  corrupted  party  may  change  its  input  before  sending  them  to  T. 

Trusted  party  computes  output:  T  computes  F(x  1,  X2). 

Adversary  learns  output:  T  returns  F(x  1,  £2)  to  A  (here,  either  Di  or  D2  is  controlled  by  the  adversary  A). 

Honest  parties  learn  output:  A  determines  if  the  honest  party  should  get  the  output  and  sends  this  to  T.  T 
sends  F(x\,  X2)  to  this  honest  party  (if  the  adversary  says  so)  and  _L  otherwise. 

Outputs:  Honest  parties  output  whatever  T  gives  them.  Corrupted  parties,  wlog,  output  A.  The  view  of  A  in 
the  ideal  world  execution  above  includes  the  inputs  of  corrupt  parties,  the  outputs  of  all  parties,  as  well 
as  the  entire  view  of  all  corrupt  parties  in  the  system.  A  can  output  any  arbitrary  function  of  its  view  and 
we  denote  the  random  variable  consisting  of  this  output,  along  with  the  outputs  of  all  honest  parties,  by 

IdealF)A(x1,x2). 

Real  World.  In  the  real  world,  there  is  no  trusted  party  and  the  parties  interact  directly  with  each  other  accord¬ 
ing  to  a  protocol  n2pc.  Honest  parties  follow  all  instructions  of  n2pc,  while  adversarial  parties  arc  coordinated  by 
a  single  adversary  A  and  may  behave  arbitrarily.  At  the  conclusion  of  the  protocol,  honest  clients  compute  their 
output  as  prescribed  by  the  protocol. 

For  any  set  of  adversarial  parties  (that  is,  corrupt  Di  or  D2)  controlled  by  A  and  protocol  n2pc  for  computing 
function  F,  we  let  RLAL^fxi ,  X2)  be  the  random  variable  denoting  the  output  of  A  in  the  real  world  execution 
above,  along  with  the  output  of  the  honest  parties.  Rl-:AL7r,_4(.x'i ,  X2)  can  be  an  arbitrary  function  of  the  view  of 
A  that  consists  of  the  inputs  (and  random  tape)  of  corrupt  part  ics,  the  outputs  of  all  parties  in  the  protocol,  as 
well  as  the  entire  view  of  all  corrupt  parties  in  the  system. 

Security  Definition.  Intuitively,  we  require  that  for  every  adversary  in  the  real  world,  there  exists  an 
adversary  in  the  ideal  world,  such  that  the  views  of  these  two  adversaries  are  computationally  indistinguishable. 
Formally, 

Definition  3  Let  F  and  n  2pc  be  as  above.  Protocol  n  2pc  is  a  secure  protocol  for  computing  F  if  for  every  PPT 
adversary  A  that  corrupts  either  Di  or  D2.  in  the  real  model,  there  exists  a  PPT  adversary  S2pc  ( that  corrupts 
the  same  party  as  A)  in  the  ideal  execution,  such  that: 


Q 

IdealFi^(xi,x2)  =  Real^  (aq,^) 


D  Proof  details 

D.l  Indistinguishability  of  the  Views 

In  prover  to  prove  Theorem  1,  we  consider  a  series  of  hybrid  experiments  'Ho, . . . ,  TL4,  where  "Ho  represents  the 
real  world  execution,  while  TL4  corresponds  to  the  simulated  execution  in  the  ideal  world.  We  will  show  that  each 
consecutive  pair  of  hybrid  experiments  arc  computationally  indistinguishable.  We  can  therefore  conclude  that 
"Ho  and  TT 4  are  computationally  indistinguishable,  as  required. 
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Experiment  Hq.  This  experiments  corresponds  to  the  real  world  execution.  The  simulator  simply  uses  the 
honest  party  input  and  runs  the  honest  party  algorithm  in  the  protocol  execution. 

Experiment  Hi.  This  experiment  is  the  same  as  Ho,  except  that  in  the  pre-processing  phase,  S  runs  the  sim¬ 
ulators  jS'fhei  *S'prf  and  ,Stest  instead  of  running  the  honest  party  algorithm.  Note  that  the  functionalities  Ff\,e,  /  prf 
and  Ftest  arc  still  computed  honestly,  in  the  same  manner  as  in  Hq. 

Indistinguishability  of  Ho  and  H\:  From  the  security  of  the  two-party  computation  protocols  Ilfhe,  nprf  and 
ntest,  it  immediately  follows  that  the  output  distributions  of  Ho  and  H\  are  computationally  indistinguishable. 

Experiment  Hi.  This  experiment  is  the  same  as  Hi,  except  that  in  the  offline -phase,  S  runs  the  simulator  Sver 
instead  of  running  the  honest  party  algorithm.  S  answers  the  output  query  of  5ver  by  computing  Fver  in  the  same 
manner  as  description  of  S. 

Indistinguishability  of  Hi  and  Hi-  From  the  security  of  the  two-party  computation  protocol  IIver,  it  immediately 
follows  that  the  output  distributions  of  Hi  and  Hi  are  computationally  indistinguishable. 

Experiment  H3.  This  experiment  is  the  same  manner  as  Hi  except  that  S  computes  the  bits  6,;j  for  the  honest 
party  Pt  as  random  bits  (instead  of  computing  them  pseudorandomly). 

Indistinguishability  of  Hi  and  %  3. -Follows  immediately  from  the  security  of  PRF. 

Experiment  H3.  This  experiment  is  the  same  as  Hi,  except  that  now,  in  order  to  compute  the  final  output  of 
Ever,  <5  queries  the  ideal  functionality  F  instead  of  performing  decryption  in  the  final  step. 

Indistinguishability  of  Hi  and  T/3/Wc  now  claim  that  hybrids  Hi  and  H3  arc  statistically  indistinguishable.  To¬ 
wards  contradiction,  suppose  that  there  exists  a  distinguisher  that  can  distinguish  between  the  output  distributions 
of  Hi  and  H3  with  inverse  polynomial  probability  pin).  Now,  note  that  the  only  difference  between  Hi  and  H3 
is  the  manner  in  which  the  final  outputs  are  computed.  In  other  words,  the  existence  of  such  a  distinguisher 
implies  that  the  outputs  computed  in  H3  and  H\  are  different.  However,  note  that  conditioned  on  the  event  that 
worker  W  performs  the  computation  correctly,  then  the  checks  performed  by  Ever  corresponding  to  the  inputs  of 
the  part ies  (i.e.,  step  no  2(c)  in  the  description  of  Ever)  guarantee  that  the  outputs  in  both  experiments  must  be  the 
same.  Thus,  from  the  check  2(6)  of  Ever,  we  now  have  that  the  existence  of  such  a  distinguisher  I)  implies  that 
with  inverse  polynomial  probability  p'{n),  the  worker  W  is  able  to  provide  incorrect  answers  at  positions  pj,  and 
correct  answers  at  positions  4  —  pj,  for  all  j  £  [n].  We  now  obtain  a  contradiction  using  the  soundness  lemma  of 
Chung  et  al.  [CKV10]. 

In  more  detail,  we  now  consider  an  experiment  G  where  the  simulator  interacts  with  the  server  as  in  H4, 
and  then  stops  the  experiment  at  the  end  of  the  online  phase.  That  is,  in  H3,  for  every  j  £  [«] .  S  prepares  each 
Xij  <—  Encpif  (EnCpfc(xi))  and  Rij  <—  Enc/<p- (/?,,).  Now,  consider  an  alternate  experiment  C  that  is  the  same 
as  G,  except  that  S  now  prepares  XhJ  •(—  Enc/<p  (/(,,).  Then,  the  following  equation  follows  from  the  semantic 
security  of  the  (outer  layer)  FHE  scheme: 

Pr[W  correct  on  {Rip,  ■  ■  . ,  R,.n)  and  incorrect  on  (X,j  , . . . ,  X,^)  in  G\ 

<  Pr[W  correct  on  {Rip, . . . ,  i?jjn)  and  incorrect  on  (Xj;  1, . . . ,  Xjj)  in  G']  +  negl(K) 

Note  that  to  obtain  the  above  equation,  we  rely  on  the  fact  that  the  function  outsourced  is  a  PPT  function,  and 
thus  we  can  check  whether  W  is  correct  or  incorrect  by  executing  the  Eva  I  algorithm.  Note  that  the  simulator 
knows  the  positions  where  the  Eva  I  checks  must  be  performed  since  it  knows  the  PRF  key  of  the  adversary  and 
can  therefore  compute  its  random  bits 
Now,  it  is  easy  to  see  that: 

Pr[W  correct  on  {Rip, . . . ,  R,.n)  and  incorrect  on  (Xtj  . . . . ,  Xij)  in  G']  <  — 

Thus,  combing  the  above  two  equations,  we  arrive  at  a  contradiction.  We  refer  the  reader  to  [CKV10]  for  more 
details. 
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Experiment  'H\.  This  experiment  is  the  same  as  % 3,  except  that  instead  of  encrypting  the  input  of  I\  honestly, 
S  computes  Xip, . . . ,  Xi<n  as  encryptions  of  the  all  zeros  string.  Note  that  this  experiment  corresponds  to  the 
ideal  world. 

Indistinguishability  of  'H:>  and  'H4:  From  the  semantic  security  of  the  (inner  layer)  FHE  scheme 
(Gen,  Enc,  Dec,  Eval),  it  immediately  follows  that  the  output  distributions  of  H4  and  'Hr,  arc  computationally 
indistinguishable.  In  more  detail,  assume  for  contradiction  that  there  exists  a  PPT  distinguisher  D  that  can  dis¬ 
tinguish  with  non-negligible  probability  between  the  output  distributions  of  'Hr,  and  H4.  Then,  we  construct  an 
adversary  B  that  breaks  the  semantic  security  for  the  FHE  scheme.  Adversary  B  takes  a  public  key  pk  from 
the  challenger  C  of  the  FHE  scheme  and  then  runs  the  simulator  S  to  generate  an  output  distribution  for  D. 
Specifically,  B  follows  the  same  strategy  as  5,  except  that  in  the  pre-processing  phase,  S  forces  pk  as  the  inner 
layer  public  key.  B  then  sends  vectors  mo,  m\  as  its  challenge  messages  to  C,  where  each  element  in  mb  is  set 
to  x  1 ,  and  each  element  in  m\  is  set  to  the  all  zeros  string.  On  receiving  the  challenge  ciphertext  vector  from 

C,  adversary  B  simply  uses  them  to  continue  the  rest  of  the  simulation  as  S  does.  Finally,  B  outputs  whatever 
D  outputs.  Now,  note  that  if  the  challenge  ciphertexts  correspond  to  mo,  then  the  resultant  output  distribution 
is  same  as  in  experiment  %%,  otherwise,  it  is  the  same  as  in  % 4.  Thus,  by  definition  D  (and  therefore  B)  must 
succeed  in  distinguishing  with  non-negligible  probability.  Thus,  we  arrive  at  a  contradiction. 

D. 2  Security  of  the  Many-time  Verifiable  Computation  Scheme 

So  far,  we  have  shown  that  our  protocol  is  one-time  secure  (namely,  that  soundness  holds  when  one  execution 
of  the  online  and  offline  phases  are  executed).  We  now  proceed  to  show  that  the  protocol  is  many-time  secure 
(i.e.,  we  can  run  (unbounded)  polynomially  many  online  and  offline  phases  after  one  run  of  the  pre-processing 
phase).  As  in  the  works  of  [GGP10,  CKV10],  we  work  in  a  model  in  which  if  the  result  of  some  computation 
returned  by  the  server  is  rejected  by  any  of  the  clients,  the  clients  execute  a  new  pre-processing  phase  and 
pick  new  parameters.  To  prove  our  security,  let  us  first  recall  how  [CKV10]  go  from  one-time  to  many-time 
security.  [CKV10]  show  that  if  we  have  a  one-time  delegation  scheme,  then  this  can  be  converted  into  a  many¬ 
time  delegation  scheme  simply  by  executing  the  entire  protocol  under  another  layer  of  fully  encryption.  A  fresh 
public  key  for  the  FHE  scheme  is  chosen  for  every  execution  by  the  client  in  the  online  phase.  Note  that  the 
way  we  achieve  multi-time  security  is  also  similar  -  the  clients  independently  pick  a  fresh  public  key  for  the 
multi-key  homomorphic  encryption  scheme  of  [LTV  12]  and  execute  the  one-time  protocol  under  this  layer  of 
fully  homomorphic  encryption.  The  proof  that  our  protocol  is  also  multi-time  secure  is  quite  similar  to  that  of 
[CKV 10];  however,  there  are  a  few  subtle  changes  that  we  need  to  make. 

To  see  these  changes,  let  us  first  understand  the  idea  behind  the  proof  of  [CKV  10].  [CKV10]  reduce  the 
security  of  the  multi-time  scheme  to  that  of  the  one-time  scheme.  That  is,  given  an  adversary  that  breaks  the 
security  of  the  multi-time  scheme,  they  construct  an  adversary  that  breaks  the  security  of  the  one-time  scheme. 
Both  adversaries  execute  the  pre-processing  phase  in  exactly  the  same  manner.  Let  L  be  an  upper  bound  on 
the  total  number  of  times  the  one-time  stage  is  executed  by  the  adversary.  The  one-time  scheme  adversary  that 
they  construct  picks  one  of  these  executions,  say  ith,  at  random  and  chooses  to  break  the  one-time  security  of 
that  execution.  In  all  other  executions,  the  adversary  will  “simulate”  the  protocol  by  sending  encryptions  of  all¬ 
zeroes,  instead  of  sending  the  encryption  of  the  actual  message  that  is  a  function  of  X  (which  is  an  encryption  of 
the  client’s  input),  R  (which  is  an  encryption  of  the  client’s  secret  state  used  to  verify  the  protocol),  and  the  secret 
bit  b  (which  is  chosen  fresh  in  every  execution).  In  these  executions  (all  executions  other  than  the  ith),  one  can 
show  (via  the  semantic  security  of  the  FHE  scheme)  that  the  messages  sent  in  the  real  and  simulated  executions 
arc  indistinguishable.  An  important  point  here  is  that  the  client  never  rejects  any  of  these  executions  here  and 
always  proceeds  with  the  computation  as  if  it  accepted  it.  In  the  r1  execution,  the  adversary  will  pick  the  public 
key  and  secret  key  for  the  FHE  scheme  on  its  own.  Now,  the  adversary  encrypts  the  query  under  this  public  key 
and  once  it  obtains  the  response  of  the  worker  from  the  many-time  adversary,  it  decrypts  it  using  the  secret  key 
to  the  obtain  the  response  that  the  adversarial  worker  in  the  one-time  game  must  produce. 

We  shall  also  follow  the  same  overall  strategy.  The  one-time  adversary  that  we  build  will  execute  the  pre- 
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processing  phase  in  exactly  the  same  manner  as  the  many-time  adversary.  We  will  also  pick  one  of  the  L  exe¬ 
cutions  at  random  and  choose  to  break  the  one-time  security  of  that  execution.  However,  unline  [CKV10],  we 
cannot  simulate  the  other  executions  by  sending  encryptions  of  all-zeroes.  This  is  because,  if  we  did  so,  then 
the  verification  performed  by  the  clients  in  the  offline  phase  will  necessarily  fail  and  in  the  event  that  one  of  the 
clients  colludes  with  the  server  (which  is  unique  to  our  setting),  this  information  will  be  learned  by  the  server  and 
we  will  not  be  able  to  continue  with  simulation.  The  way  around  it  is  to  simulate  other  executions  by  sending 
the  encryption  of  the  message  exactly  as  we  would  do  in  a  real  run  of  the  protocol,  that  is  as  a  function  of  X, 
R,  and  the  secret  bit  b,  except  that  we  shall  replace  the  r  encrypted  in  R,  and  encrypt  all-zeroes  instead  (note 
that  b  is  not  paid  of  the  secret  state  as  this  varies  from  execution  to  execution).  Note  that  by  doing  this  we  do  not 
use  the  secret  state  that  is  carried  between  executions  anywhere  in  our  simulation.  Now,  in  the  offline  phase,  our 
adversary  will  run  the  simulator  for  the  two-party  computation  protocol  and  force  the  clients  to  output  “accept” 
and  the  result  of  the  computation  to  be  F(x  1,  £2)-  Now,  note  that  client  never  rejects  any  of  these  executions  and 
always  proceeds  with  the  computation  as  if  it  accepted  it.  The  rest  of  the  simulation  works  exactly  the  same  as 
in  the  case  of  [CKV 10]  and  with  these  slight  changes,  our  proof  of  security  goes  through.  We  now  present  more 
details. 

Let  B  be  an  adversary  (controling  a  cheating  D?i  and  W*)  that  succeeds  with  non-negligible  probability  in 
breaking  the  security  when  executing  the  online  phase  multiple  times.  We  shall  construct  an  adversary  A  that 
also  succeeds  in  breaking  the  security  with  non-negligible  probability,  but  when  executing  the  online  phase  only 
once. 

-  A  executes  the  pre-processing  phase  in  exactly  the  same  manner  as  B.  In  other  words,  A  executes  the 
pre-processing  phase  exactly  as  the  simulator  described  in  Section  4.1  does. 

-  Next,  let  L  be  an  upper  bound  on  the  total  number  of  online  executions  run  by  adversary  B.  A  picks  an 
index  1  <  i  <  L  at  random,  and  this  is  the  execution  that  it  will  use  to  distinguish  between  the  real  and 
ideal  worlds. 

-  In  every  execution  k  /  i,  A  does  as  follows:  A  uses  the  honest  client  Di’s  input  in  that  execution, 
say  x\,  in  computing  the  messages  sent  by  Di  in  the  online  phase.  However,  instead  of  using  the 
random  value  R\y  in  the  online  phase  (for  1  <  j  <  n),  A  will  use  encryptions  of  the  all-zero  string 
instead.  A  will  execute  the  rest  of  the  online  phase  exactly  as  an  honest  Di  would  (that  is,  A  picks 
random  bits  b  \  j ,  for  1  <  j  <  n  and  sends  the  ordered  encryption  tuples  (of  x\  and  0)  to  W). 

-  In  the  ilh  execution,  A  does  as  follows:  A  picks  the  keys  for  the  outermost  layer  of  the  FHE  scheme 
on  its  own  (both  the  public  and  secret  keys  (pk*,  sk*)).  A  upon  receiving  a  query  from  the  client 
Di  in  the  one-time  execution,  will  prepare  the  query  using  this  value  and  use  the  public  key  pk*  to 
encrypt  the  query.  A  will  send  this  value  to  B.  Upon  receiving  the  response  from  B,  A  will  decrypt 
it  using  sk*  and  send  this  value  as  the  value  sent  by  the  malicious  worker  controled  by  A  in  the 
one-time  execution. 

Note  that  if  B  terminates  the  game  before  the  ith  execution,  then  A  aborts  and  gives  up. 

-  In  the  offline  phase,  for  all  executions  k  /  i,  A  will  execute  the  simulator,  Sver,  for  the  two-party  compu¬ 
tation  protocol,  nver  along  with  the  corrupted  client  B  to  generate  a  simulated  execution  of  nver.  At  some 
point  during  the  simulation,  Sver  will  make  a  query  to  the  ideal  functionality  Fver  with  some  input  (say) 
Z.  At  this  point  A  will  simply  return  F(x\ .  X2)  as  the  result  of  the  computation  to  Sver.  On  receiving  this 
output  value  y  from  A,  Syer  continues  the  simulation  of  nver  where  it  forces  the  output  y  on  B.  In  the 
offline  phase  for  the  ith  execution,  A  will  execute  the  protocol  honestly  in  this  phase. 

The  proof  that  A  also  succeeds  with  non-negligible  probability,  when  B  succeeds  with  non-negligible  proba¬ 
bility  follows  via  a  standard  hybrid  argument.  At  a  high  level,  note  that  we  can  argue  about  the  success  probability 
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of  A  only  in  the  case  when  A  guesses  correctly  the  first  execution  when  B  will  execute  the  protocol  maliciously. 
This  is  because,  if  B  executes  the  protocol  maliciously  for  some  execution  q  <  i,  then  since  A  will  simulate  the 
output  of  the  computation  to  be  F(x  1,2:2)  (in  the  offline  phase),  when  in  the  real  world  the  output  of  the  compu¬ 
tation  maybe  different  causing  B  to  distinguish  between  the  real  and  ideal  worlds.  Hence,  let  us  consider  the  case 
when  A  guesses  correctly  the  first  execution  when  B  is  malicious.  Note  that  this  happens  with  probability  j-. 
Now,  given  that  the  first  instance  that  B  is  malicious  is  only  in  the  ith  execution,  we  have  that  B  is  honest  in  the 
first  i  —  1  executions.  In  these  executions,  we  can  show  indistinguishability  of  the  different  hybrids  (where  we 
replace  a  real  execution  with  a  simulated  execution  in  a  step-by-step  manner)  via  a  simple  hybrid  argument  as  the 
only  difference  between  the  real  and  ideal  executions  is  that  we  arc  replacing  encryptions  of  R\:J  in  the  online 
phase  (for  1  <  j  <  n)  with  encryptions  of  all  zeroes  and  we  arc  simulating  the  offline  phase  so  that  it  outputs  the 
value  F(x  1,  £2)  1:0  the  adversary.  The  first  change  is  indistinguishable  due  to  the  semantic  security  of  the  FHE 
scheme,  while  the  second  change  is  indistinguishable  since  the  adversary  is  indeed  honest  in  this  execution  and 
an  honest  execution  indeed  does  evaluate  to  F(x\,x@)  (from  the  indistinguishability  of  the  simulated  two-party 
computation  protocol  from  the  real  protocol  with  same  output  value,  this  indistinguishability  follows).  Hence,  if 
B  succeeds  with  probability  ps,  then  A  succeeds  with  probability  at  least  ^  +  negl.  We  leave  further  details  to 
the  full  version  of  the  paper. 

E  Multi-party  verifiable  computation 

Let  us  have  the  clients  pick  the  bits  ht  at  random  from  a  distribution  that  outputs  1  with  probability  ^  and  0 
otherwise  (such  a  distribution  can  easily  be  sampled;  simply  pick  log  n  bits  uniformly  at  random  and  outputting 
1  iff  all  log  n  bits  arc  1).  Let  us  look  at  the  completeness  of  the  protocol  in  this  case.  When  all  parties  arc  honest, 
the  probability  that  exactly  one  6*  =  1  and  all  other  bfi s  are  0  is  n  x  ^  x  (1  —  ^)n_1  which  is  >  The  probability 
that  there  is  a  completeness  error  is  bounded  by  1  —  So,  if  the  clients  repeat  the  above  protocol  (in  parallel) 
n  number  of  times,  then  the  probability  that  none  of  the  repetitions  succeed  will  be  negligibly  small  in  n.  The 
clients  can  check  only  the  run  of  the  protocol  that  succeeded  during  the  offline  verification  phase,  and  obtain  the 
result  of  the  computation. 

The  problem  with  this  approach  is  that  a  set  of  corrupted  clients  can  claim  that  their  random  bits  are  such  that 
bi  =  1  for  a  corrupted  client.  Now,  with  constant  probability,  bj  will  be  0  for  all  honest  clients.  This  means  that 
the  colluding  adversarial  clients  along  with  the  server  will  have  complete  knowledge  of  the  bits  b\,  -  ■  ■  ,  bn  in  this 
case  and  hence  soundness  will  be  completely  defeated. 

We  get  around  this  problem  as  follows.  We  repeat  the  protocol  (in  parallel),  a  total  of  2err  n  number  of  times. 
But  now,  a  client  D,  will  accept  the  output  of  the  computation,  iff  there  arc  at  least  n  number  of  repetitions  in 
which  bi  =  1  and  bj  =  0  for  all  j  fi  i  (this  will  be  checked  in  the  secure  computation  protocol  run  during  the 
offline  phase).  Let  us  first  analyze  the  completeness  of  this  protocol.  Note  that,  for  a  particular  bi ,  the  probability 
that  bi  =  1  and  all  other  bj’s  arc  0  is  at  least  — .  Hence,  if  run  2 enn  executions  in  parallel,  except  with  negligible 
probability  (in  k,  via  the  Chernoff  bound),  we  get  that  there  will  be  k  number  of  repetitions  in  which  b,  =  1  and 
bj  =  0  for  all  j  fi  i.  Since  we  arc  running  2en2 k  parallel  repetitions,  except  with  negligible  probability,  for  all 
clients  Dj,  this  condition  will  be  met. 

Now,  let  us  analyze  the  security  of  this  protocol.  Note  that  the  adversarial  set  of  clients  (totally  an  clients  for 
constant  0  <  a  <  1)  cannot  simply  set  their  bits  such  that  one  of  their  b  bits  is  always  1  in  all  parallel  repetitions 
of  the  protocol.  Hence,  the  adversarial  clients  must  set  their  bits  to  0  in  at  least  (1  —  a) mi  executions.  Now,  note 
that  in  these  repetitions  of  the  protocol,  the  adversary  has  no  idea  as  to  which  honest  clients  bit  bt  =  1  (this  can  be 
shown  using  the  same  techniques  as  in  the  proof  of  the  two-party  protocol).  Since  the  adversarial  client  can  force 
a  wrong  output  (by  colluding  with  the  corrupted  worker)  only  by  guessing  this,  the  probability  with  which  this 
happens  is  ( (l-a)?;  which  is  negligible  in  the  security  parameter  n.  Hence,  a  set  of  adversarial  clients 

succeed  in  forcing  a  wrong  output  only  with  negligible  probability. 
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Abstract 

We  present  a  novel  framework  for  the  description  and  analysis  of  secure  computation  proto¬ 
cols  that  is  at  the  same  time  mathematically  rigorous  and  notationally  lightweight  and  concise. 
The  distinguishing  feature  of  the  framework  is  that  it  allows  to  specify  (and  analyze)  protocols 
in  a  manner  that  is  largely  independent  of  time,  greatly  simplifying  the  study  of  cryptographic 
protocols.  At  the  notational  level,  protocols  are  described  by  systems  of  mathematical  equations 
(over  domains),  and  can  be  studied  through  simple  algebraic  manipulations  like  substitutions 
and  variable  elimination.  We  exemplify  our  framework  by  analyzing  in  detail  two  classic  pro¬ 
tocols:  a  protocol  for  secure  broadcast,  and  a  verifiable  secret  sharing  protocol,  the  second  of 
which  illustrates  the  ability  of  our  framework  to  deal  with  probabilistic  systems,  still  in  a  purely 
equational  way. 


1  Introduction 

Secure  multiparty  computation  (MPC)  is  a  cornerstone  of  theoretical  cryptography,  and  a  problem 
that  is  attracting  increasingly  more  attention  in  practice  too  due  to  the  pervasive  use  of  distributed 
applications  over  the  Internet  and  the  growing  popularity  of  computation  outsourcing.  The  area 
has  a  long  history,  dating  back  to  the  seminal  work  of  Yao  [29]  in  the  early  1980s,  and  a  steady 
flow  of  papers  contributing  extensions  and  improvements  that  lasts  to  the  present  day  (starting 
with  the  seminal  works  [12,  6,  3]  introducing  general  protocols,  and  followed  by  literally  hundreds  of 
papers).  But  it  is  fair  to  say  that  MPC  has  yet  to  deliver  its  full  load  of  potential  benefits  both  to  the 
applied  and  theoretical  cryptography  research  communities.  In  fact,  large  portions  of  the  research 
community  still  see  MPC  as  a  highly  specialized  research  area,  where  only  the  top  experts  can 
read  and  fully  understand  the  highly  technical  research  papers  routinely  published  in  mainstream 
crypto  conferences.  Two  main  obstacles  have  kept,  so  far,  MPC  from  becoming  a  more  widespread 
tool  to  be  used  both  in  theoretical  and  applied  cryptography:  the  prohibitive  computational  cost 
of  executing  many  MPC  protocols,  and  the  inherent  complexity  of  the  models  used  to  describe 
the  protocols  themselves.  Much  progress  has  been  made  in  improving  the  efficiency  of  the  first 
protocols  [29,  12,  6]  in  a  variety  of  models  and  with  respect  to  several  complexity  measures,  even 
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leading  to  concrete  implementations  (cf.  e.g.  [19,  4,  17,  7,  24,  11]).  However,  the  underlying  models 
to  describe  and  analyze  security  properties  are  still  rather  complex. 

What  makes  MPC  harder  to  model  than  traditional  cryptographic  primitives  like  encryption, 
is  the  inherently  distributed  nature  of  the  security  task  being  addressed:  there  are  several  distinct 
and  mutually  distrustful  parties  trying  to  perform  a  joint  computation,  in  such  a  way  that  even  if 
some  parties  deviate  from  the  protocol,  the  protocol  still  executes  in  a  robust  and  secure  way. 

The  difficulty  of  properly  modeling  secure  distributed  computation  is  well  recognized  within  the 
cryptographic  community,  and  documented  by  several  definitional  papers  attempting  to  improve 
the  current  state  of  the  art  [22,  9,  10,  23,  2,  18,  15,  20].  Unfortunately,  the  current  state  of 
the  art  is  still  pretty  sore,  with  definitional/modeling  papers  easily  reaching  encyclopedic  page 
counts,  setting  a  very  high  barrier  of  entry  for  most  cryptographers  to  contribute  or  actively  follow 
the  developments  in  MPC  research.  Moreover,  most  MPC  papers  are  written  in  a  semi-formal 
style  reflecting  an  uncomfortable  trade-off  between  the  desire  of  giving  to  the  subject  the  rigorous 
treatment  it  deserves  and  the  implicit  acknowledgment  that  this  is  just  not  feasible  using  the 
currently  available  formalisms  (and  even  more  so,  within  the  page  constraints  of  a  typical  conference 
or  even  journal  publication.)  At  the  other  end,  recent  attempts  to  introduce  abstractions  [20]  are 
too  high  level  to  deliver  a  precise  formal  language  for  protocol  specification.  The  goal  of  this  paper 
is  to  drastically  change  this  state  of  affairs,  by  putting  forward  a  model  for  the  study  of  MPC 
protocols  that  is  both  concise,  rigorous,  and  still  firmly  rooted  in  the  intuitive  ideas  that  pervade 
most  past  work  on  secure  computation  and  most  cryptographers  know  and  love. 

1.1  The  simulation  paradigm 

Let  us  recall  the  well  known  and  established  simulation  paradigm  that  underlies  essentially  all 
MPC  security  definitions.  Cryptographic  protocols  are  typically  described  by  several  component 
programs  P\, ,  Pn  executed  by  n  participating  parties,  interconnected  by  a  communication  net¬ 
work  N,  and  usually  rendered  by  a  diagram  similar  to  the  one  in  Figure  1  (left):  each  party  receives 
some  input  Xi  from  an  external  environment,  and  sends/receives  messages  Sj,r,;  from  the  network. 
Based  on  the  external  inputs  Xi,  and  the  messages  Sj,  n  transmitted  over  the  network  N,  each  party 
produces  some  output  value  yi  which  is  returned  to  the  outside  world  as  the  visible  output  of  run¬ 
ning  the  protocol.  The  computational  task  that  the  protocol  is  trying  to  implement  is  described  by 
a  single  monolithic  program  F,  called  the  “ideal  functionality”,  which  has  the  same  input/output 
interface  as  the  system  consisting  of  P\, ... .  Pn  and  N,  as  shown  in  Figure  1  (right).  Conceptually, 
F  is  executed  by  a  centralized  entity  that  interacts  with  the  individual  parties  through  their  local 
input/output  interfaces  Xj/j/j,  and  processes  the  data  in  a  prescribed  and  trustworthy  manner.  A 
protocol  P\, ...  ,Pn  correctly  implements  functionality  F  in  the  communication  model  provided  by 
N  if  the  two  systems  depicted  in  Figure  1  (left  and  right)  exhibit  the  same  input/output  behavior. 

Of  course,  this  is  not  enough  for  the  protocol  to  be  secure.  In  a  cryptographic  context,  some 
parties  can  get  corrupted,  in  which  case  an  adversary  (modeled  as  part  of  the  external  execution 
environment)  gains  direct  access  to  the  parties’  communication  channels  and  is  not  bound  to 
follow  the  instructions  of  the  protocol  programs  Pj.  Figure  2  (left)  shows  an  execution  where  P3 
and  Pa  are  corrupted.  The  simulation  paradigm  postulates  that  whatever  can  be  achieved  by  a 
concrete  adversary  attacking  the  protocol,  can  also  be  achieved  by  an  idealized  adversary  S  (called 
the  simulator)  attacking  the  ideal  functionality  F.  In  Figure  2  (right),  the  simulator  takes  over  the 
role  of  P3  and  P4,  communicating  for  them  with  the  ideal  functionality,  and  recreating  the  attack  of 
a  real  adversary  by  emulating  an  interface  that  exposes  the  network  communication  channels  of  P3 
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Figure  1:  A  multiparty  protocol  P±, . . .  ,P4  with  communication  network  N  (left)  implementing  a 
functionality  F  (right). 


M 

l\  l\  l\ 

/X  /X  IX 

IX 

XiVi 

\  \  Siri 

Si  Vi 

w 

\  1  1  \  1  \ 

1  \  1  \  [  1 

w 

fgh  s3  r3  s4  r4 

x\  rj]  X2  y2 

S  ) 

n 

/  \  \  \ 

\  \  /  \ 

l\ 

si  n 

S2r2  \  \ 

\  \  X3V3 

x4  y  4 

i  1 

[  1  \  1  \  1 

\  1  \  1  [  1 

l  / 

(  \ 
N 

k _ 


f  N 

F 

\ _ J 


Figure  2:  Simulation  based  security.  The  protocol  P3, . . .  ,P4  is  a  secure  implementation  of  func¬ 
tionality  F  if  the  system  (left)  exposed  to  an  adversary  that  corrupts  a  subset  of  parties  (say  P3 
and  P4)  is  indistinguishable  from  the  one  system  (right)  recreated  by  a  simulator  S  interacting  with 
F. 

and  P4.  The  protocol  P\ .....  Pn  securely  implements  functionality  F  if  the  systems  described  on 
the  left  and  right  of  Figure  2  are  functionally  equivalent:  no  adversary  (environment)  connecting 
to  the  external  channels  xi,yi,x2,y2,  s3,r3,  s4,  r4  can  (efficiently)  determine  if  it  is  interacting  with 
the  system  described  in  Figure  2  (left)  or  the  one  in  Figure  2  (right).  In  other  words,  anything 
that  can  be  achieved  corrupting  a  set  of  parties  in  a  real  protocol  execution,  can  also  be  emulated 
by  corrupting  the  same  set  of  parties  in  an  idealized  execution  where  the  protocol  functionality  F 
is  executed  by  a  trusted  party  in  a  perfectly  secure  manner. 

This  is  a  very  powerful  idea,  inspired  by  the  seminal  work  on  zero  knowledge  proof  systems  [13], 
and  embodied  in  many  subsequent  papers  about  MPC.  But,  of  course,  as  much  as  evocative  the 
diagrams  in  Figure  2  may  be,  they  fall  short  of  providing  a  formal  definition  of  security.  In  fact, 
a  similar  picture  can  be  drawn  to  describe  essentially  any  of  the  secure  multiparty  computation 
models  proposed  so  far  at  a  very  abstract  level,  but  the  real  work  is  in  the  definition  of  what 
the  blocks  and  communication  links  connecting  them  actually  represent.  Traditionally,  building 
on  classical  work  from  computational  complexity  on  interactive  proof  systems,  MPC  is  formalized 
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by  modeling  each  block  by  an  interactive  Turing  machine  (ITM),  a  venerable  model  of  sequential 
computation  extended  with  some  “communication  tapes”  used  to  model  the  channels  connecting 
the  various  blocks.  Unfortunately,  this  only  provides  an  adequate  model  for  the  local  computation 
performed  by  each  component  block,  leaving  out  the  most  interesting  features  that  distinguish  MPC 
from  simpler  cryptographic  tasks:  computation  is  distributed  and  the  concurrent  execution  of  all 
ITMs  needs  to  be  carefully  orchestrated.  In  a  synchronous  communication  environment,  where 
local  computations  proceed  in  lockstep  through  a  sequence  of  rounds,  and  messages  are  exchanged 
only  between  rounds,  this  is  relatively  easy.  But  in  asynchronous  communication  environments  like 
the  Internet,  dealing  with  concurrency  is  a  much  trickier  business.  The  standard  approach  to  deal 
with  concurrency  in  asynchronous  distributed  systems  is  to  use  nondeterminism:  a  system  does  not 
describe  a  single  behavior,  but  a  set  of  possible  behaviors  corresponding  to  all  possible  interleavings 
and  message  delivery  orders.  But  nondeterminism  is  largely  incompatible  with  cryptography,  as  it 
allows  to  break  any  cryptographic  function  by  nondeterministically  guessing  the  value  of  a  secret 
key.  As  a  result,  cryptographic  models  of  concurrent  execution  resort  to  an  adversarially  and 
adaptively  chosen,  but  deterministic,  message  delivery  order:  whenever  a  message  is  scheduled  for 
transmission  between  two  component,  it  is  simply  queued  and  an  external  scheduling  unit  (which  is 
also  modeled  as  part  of  the  environment)  is  notified  about  the  event.  While  providing  a  technically 
sound  escape  route  from  the  dangers  of  mixing  nondeterministic  concurrency  with  cryptography, 
this  approach  has  several  shortcomings: 

-  Adding  a  scheduler  further  increases  the  complexity  of  the  system,  making  simulation  based 
proofs  of  security  even  more  technical. 

-  It  results  in  a  system  that  in  many  respects  seems  overspecified:  as  the  goal  is  to  design  a 
robust  system  that  exhibits  the  prescribed  behavior  in  any  execution  environment,  it  would 
seem  more  desirable  to  abstract  the  scheduling  away,  rather  than  specifying  it  in  every  single 
detail  of  a  fully  sequential  ordering  of  events. 

-  Finally,  the  intuitive  and  appealing  idea  conveyed  by  the  diagrams  in  Figure  2  is  in  a  sense 
lost,  as  the  system  is  now  more  accurately  described  by  a  collection  of  isolated  components 
all  connected  exclusively  to  the  external  environment  that  orchestrates  their  executions  by 
scheduling  the  messages. 

1.2  Our  work 

In  this  paper  we  describe  a  model  of  distributed  computation  that  retains  the  simplicity  and 
intuitiveness  conveyed  by  the  diagrams  in  Figures  1  and  2,  and  still  it  is  both  mathematically 
rigorous  and  concise.  In  other  words,  we  seek  a  model  where  the  components  Pi,  N,  F,  S  occurring 
in  the  description  and  analysis  of  a  protocol,  and  the  systems  obtained  interconnecting  them,  can 
be  given  a  simple  and  precise  mathematical  meaning.  The  operation  of  composing  systems  together 
should  also  be  well  defined,  and  satisfy  a  number  of  useful  and  intuitive  properties,  e.g.,  the  result 
of  connecting  several  blocks  together  does  not  depend  on  the  order  in  which  the  connections  are 
made.  (Just  as  we  expect  the  meaning  of  a  diagram  to  be  independent  of  the  order  in  which  the 
diagram  was  drawn.)  Finally,  it  should  provide  a  solid  foundation  for  equational  reasoning,  in  the 
sense  that  equivalent  systems  can  be  replaced  by  equivalent  systems  in  any  context. 

Within  such  a  framework,  the  proof  that  protocols  can  be  composed  together  should  be  as  simple 
as  the  following  informal  argument.  (In  fact,  given  the  model  formally  defined  in  the  rest  of  the 
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Figure  3:  A  protocol  Qi  implementing  G  in  the  T-hybrid  model. 


paper,  the  following  is  actually  a  rigorous  proof  that  our  dehnition  satisfies  a  universal  composability 
property.)  Say  we  have  a  protocol  P\, ...  ,Pn  securely  implementing  ideal  functionality  F  using  a 
communication  network  N,  and  also  a  protocol  Qi, ,  Qn  in  the  T-hybrid  model  (i.e.,  an  idealized 
model  where  parties  can  interact  through  functionality  F)  that  securely  implements  functionality 
G.  The  security  of  the  second  protocol  is  illustrated  in  Figure  3. 

Then,  the  protocol  obtained  simply  by  connecting  P,  and  Qi  together  is  a  secure  implementation 
of  G,  in  the  standard  communication  model  N.  Moreover,  the  simulator  showing  that  the  composed 
protocol  is  secure  is  easily  obtained  simply  by  composing  the  simulators  for  the  two  component 
protocols.  In  other  words,  we  want  to  show  that  an  adversary  attacking  the  real  system  described 
in  Figure  4  (left)  is  equivalent  to  the  composition  of  the  simulators  attacking  the  ideal  functionality 
G  as  described  in  Figure  4  (right). 

This  is  easily  shown  by  transforming  Figure  4  (left)  to  Figure  4  (right)  in  two  steps,  going 
through  the  hybrid  system  described  in  Figure  5.  Specifically,  first  we  use  the  security  of  P,  to 
replace  the  system  described  in  Figure  2  (left)  with  the  one  in  Figure  2  (right).  This  turns  the 
system  in  Figure  4  (left)  into  the  equivalent  one  in  Figure  5.  Next  we  use  the  security  of  Qi  to 
substitute  the  system  in  Figure  3  (left)  with  the  one  in  Figure  3  (right).  This  turns  Figure  5  into 
Figure  4  (right). 

While  the  framework  proposed  in  this  paper  allows  to  work  with  complex  distributed  systems 
with  the  same  simplicity  as  the  informal  reasoning  described  in  this  section,  it  is  quite  powerful  and 
flexible.  For  example,  it  allows  to  model  not  only  protocols  that  are  universally  composable,  but 
also  protocols  that  retain  their  security  only  when  used  in  restricted  contexts.  For  simplicity,  in  this 
paper  we  focus  on  perfectly  secure  protocols  against  unbounded  adversaries,  as  this  already  allows 
us  to  describe  interesting  protocols  that  illustrate  the  most  important  feature  of  our  framework: 
the  ability  to  design  and  analyze  protocols  without  explicitly  resorting  to  the  notion  of  time  and 
sequential  scheduling  of  messages.  Moreover,  within  the  framework  of  universally  composability, 
it  is  quite  common  to  design  perfectly  secure  protocols  in  a  hybrid  model  that  offers  idealized 
versions  of  the  cryptographic  primitives,  and  then  resorting  to  computationally  secure  cryptographic 
primitives  only  to  realize  the  hybrid  model.  So,  a  good  model  for  the  analysis  of  perfect  or  statistical 
security  can  already  be  a  useful  and  usable  aid  for  the  design  of  more  general  computationally  secure 
protocols.  Natively  extending  our  framework  to  statistically  or  computationally  secure  protocols  is 
also  an  attractive  possibility.  We  consider  the  perfect/statistical/computational  security  dimension 
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Figure  4:  Protocol  composition.  Security  is  proved  using  a  hybrid  argument. 
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Figure  5:  Hybrid  system  to  prove  the  security  of  the  composed  protocol 
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as  being  mostly  orthogonal  to  the  issues  dealt  with  in  this  paper,  and  we  believe  the  model  described 
here  offers  a  solid  basis  for  extensions  in  that  direction. 

1.3  Techniques 

In  order  to  realize  our  vision,  we  introduce  a  computational  model  in  which  security  proofs  can 
be  carried  out  without  explicitly  dealing  with  the  notion  of  time.  Formally,  we  associate  to  each 
communication  channel  connecting  two  components  the  set  of  all  possible  “channel  histories”, 
partially  ordered  according  to  their  information  content  or  temporal  ordering.  The  simplest  example 
is  the  set  of  all  finite  sequences  M*  of  messages  from  some  underlying  message  space,  ordered 
according  to  the  prefix  ordering  relation.  The  components  of  the  system  are  then  modeled  as 
functions  mapping  input  histories  to  output  histories.  The  functions  are  subject  to  some  natural 
conditions,  e.g.,  monotonicity:  receiving  more  input  values  can  only  results  in  more  output  values 
being  produced.  Under  appropriate  technical  conditions  on  the  ordered  sets  associated  to  the 
communication  channels,  and  the  functions  modeling  the  computations  performed  by  the  system 
components,  this  results  in  a  well  behaved  framework,  where  components  can  be  connected  together, 
even  forming  loops,  and  always  resulting  in  a  unique  and  well  defined  function  describing  the 
input/output  behavior  of  the  whole  system.  Previous  approaches  to  model  interactive  systems, 
such  as  Kahn  networks  [16]  and  Maurer’s  random  systems  [21],  can  indeed  be  seen  as  special  cases 
of  our  general  process  model.1  The  resulting  model  is  quite  powerful,  allowing  even  to  model 
probabilistic  computation  as  a  special  case.  However,  the  simplicity  of  the  model  has  a  price:  all 
components  of  the  system  must  be  monotone  with  respect  to  the  information  ordering  relation.  For 
example,  if  a  program  P  on  input  messages  x±,X2  outputs  P(x\,X2)  =  (yi,  y2,V3),  then  on  input 
xi,X2,X3  it  can  only  output  a  sequence  of  messages  that  extends  (yi,y2,  y:i)  with  more  output. 
In  other  words,  P  cannot  “go  back  in  time”  and  change  2/1 , 3/2 ?  2/3 -  While  this  is  a  very  natural 
and  seemingly  innocuous  restriction,  it  also  means  that  the  program  run  by  P  cannot  perform 
operations  of  the  form  “if  no  input  message  has  been  received  yet,  then  send  y” .  This  is  because  if 
an  input  message  is  received  at  a  later  point,  P  cannot  go  back  in  time  and  not  send  y. 

It  is  our  thesis  that  these  time  dependent  operations  make  cryptographic  protocols  harder  to 
understand  and  analyze,  and  therefore  should  be  avoided  whenever  possible. 

Organization.  The  rest  of  the  paper  is  organized  as  follows.  In  Section  2  we  present  our  frame¬ 
work  for  the  description  and  analysis  of  concurrent  processes,  and  illustrate  the  definitions  using  a 
toy  example.  Next,  we  demonstrate  the  applicability  of  the  framework  by  carefully  describing  and 
analyzing  two  classic  cryptographic  protocols:  secure  broadcast  (in  Section  3)  and  verifiable  secret 
sharing  (in  Section  4) .  The  secure  broadcast  protocol  in  Section  3  is  essentially  the  one  of  Bracha, 
and  only  uses  deterministic  functions.  Our  modular  analysis  of  the  protocol  illustrates  the  use  of 
subprotocols  that  are  not  universally  composable.  The  verifiable  secret  sharing  protocol  analyzed 
in  Section  4  provides  an  example  of  randomized  protocol. 

1For  the  readers  well  versed  in  the  subject,  we  remark  that  our  model  can  be  regarded  as  a  generalization  of 
Kalin  networks  where  the  channel  behaviors  are  elements  of  arbitrary  partially  ordered  sets  (or,  more  precisely, 
domains)  rather  than  simple  sequences  of  messages.  This  is  a  significant  generalization  that  allows  to  deal  with 
probabilistic  computations  and  intrinsically  nondeterministic  systems  seamlessly,  without  incurring  into  the  Brock- 
Ackerman  anomaly  and  similar  problems. 
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2  Distributed  Systems,  Composition,  and  Secure  Computation 


In  this  section  we  introduce  our  mathematical  framework  for  the  description  and  analysis  of  dis¬ 
tributed  systems.  We  start  with  a  high  level  description  of  our  approach,  which  will  be  sufficient 
to  apply  our  framework  and  to  follow  the  proofs.  We  then  give  more  foundational  details  justifying 
soundness  of  our  approach.  Finally,  we  provide  security  definitions  for  protocols  in  our  framework. 

2.1  Processes  and  systems 

Introducing  processes:  An  example  and  notational  conventions.  Our  framework  models 
(asynchronous  and  reactive)  processes  and  systems  with  one  or  more  input  and  output  channels 
as  mathematical  functions  mapping  input  histories  to  output  histories.  Before  introducing  more 
formal  definitions,  let  us  illustrate  this  concept  with  a  simple  example.  Consider  a  deterministic 
process  with  one  input  and  one  output  channels,  which  receives  as  input  a  sequence  of  messages, 
x[l], . . . ,  x[k\,  where  each  x[k\  is  (or  can  be  parsed  as)  an  integer.  The  process  receives  the  messages 
sequentially,  one  at  a  time,  and  in  order  to  make  the  process  finite  one  may  assume  that  the  process 
will  accept  only  the  first  n  messages.  Upon  receiving  each  input  message  x[i] ,  the  process  increments 
the  value  and  immediately  outputs  x[i\  + 1.  It  is  not  hard  to  model  the  process  in  terms  of  a  function 
mapping  input  to  output  sequences:  The  input  and  output  of  the  function  modeling  the  process  are 
the  set  Z-n  of  integer  sequences  of  length  at  most  n,  and  the  process  is  described  by  the  function 
F :  h-n  — >  Z-n  mapping  each  input  sequence  x  G  Zfc  (for  some  k  <  n)  to  the  output  sequence 
y  e  if  of  the  same  length  defined  by  the  equations  y[i\  =  x[i\  +  1  (for  i  =  1, . . . ,  k).  There  are 
multiple  ways  one  can  possibly  describe  such  a  function.  We  describe  the  process  in  equational 
form  as  in  Figure  6  (left).  In  the  example,  the  first  line  assigns  names  to  the  function,  input 
and  output  variables,  while  the  remaining  lines  are  equations  that  define  the  value  of  the  output 
variables  in  terms  of  the  input  variables.  Each  variable  ranges  over  a  specific  set  (x,y  G  Z-”),  but 
for  simplicity  we  often  leave  the  specification  of  this  set  implicit,  as  it  is  usually  clear  from  the 
context.  By  convention,  all  variables  that  appear  in  the  equations,  but  not  as  part  of  the  input 
or  output  variables,  are  considered  local/internal  variables,  whose  only  purpose  is  to  help  defining 
the  value  of  the  output  variables  in  terms  of  the  input  variables.  Free  index  variables  (e.g.,  i,j) 
are  universally  quantified  (over  appropriate  ranges)  and  used  to  compactly  describe  sets  of  similar 
equations. 

Processes  as  monotone  functions.  In  general,  the  reason  we  define  a  process  F  as  a  function 
mapping  sequences  to  sequences2  (rather  than,  say,  as  a  function  f(x)  =  x  +  1  applied  to  each 
incoming  message  x)  is  that  it  allows  to  describe  the  most  general  type  of  (e.g.,  stateful,  reactive) 
process,  whose  output  is  a  function  of  all  messages  received  as  input  during  the  execution  of  the 
protocol.  (Note  that  we  do  not  need  to  model  state  explicitly.)  Also,  such  functions  can  describe 
processes  with  multiple  input  and  output  channels  by  letting  inputs  and  outputs  be  tuples  of 
message  sequences.  However,  clearly,  not  any  such  function  mapping  input  to  output  sequences 
can  be  a  valid  process.  To  capture  valid  functions  representing  a  process,  input  and  output  sets  are 
endowed  with  a  partial  ordering  relation  <,  where  x  <  y  means  that  y  is  a  possible  future  of  x.  (In 
the  case  of  sequences  of  messages,  <  is  the  standard  prefix  partial  ordering  relation,  where  x  <  y  if 
y  =  x\z  for  some  other  sequence  z,  and  x\z  is  the  concatenation  of  the  two  sequences.)  Functions 

JHere  we  use  sequences  just  as  a  concrete  example.  Our  framework  uses  more  general  structures,  namely  domains. 
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Figure  6:  Some  simple  processes 


describing  processes  should  be  naturally  restricted  to  monotone  functions,  i.e.,  functions  such  that 
x  <  y  implies  F(x')  <  F(y).  In  our  example,  this  simply  means  that  if  on  input  a  sequence  of 
messages  x,  F(x)  is  produced  as  output,  upon  receiving  additional  messages  z,  the  output  sequence 
can  only  get  longer,  i.e.,  F(y)  =  F(x|z)  =  F(x)|z/  for  some  z' .  In  other  words,  once  the  messages 
F(x)  are  sent  out,  the  process  cannot  change  its  mind  and  set  the  output  to  a  sequence  that  does 
not  start  with  F(x). 

Note  that  so  far  we  only  discussed  an  example  of  a  deterministic  process.  Below,  after  intro¬ 
ducing  some  further  foundational  tools,  we  will  see  that  probabilistic  processes  are  captured  in 
the  same  way  by  letting  the  function  output  be  a  distribution  over  sequences,  rather  than  a  single 
sequence  of  symbols. 

Further  examples  and  notational  conventions.  In  the  examples,  |x|  denotes  the  length  of  a 
sequence  x.  and  we  use  array  notation  x[i\  to  index  the  elements  of  a  sequence.  Figure  6  gives  two 
more  examples  of  processes  that  further  illustrate  notational  conventions.  Process  G,  in  Figure  6 
(middle),  simply  duplicates  the  input  y  (as  usual  in  Z-n)  and  copies  the  input  messages  to  two 
different  output  channels  z  and  w.  When  input  or  output  values  are  tuples,  we  usually  give  separate 
names  to  each  component  of  the  tuple.  As  before,  all  variables  take  values  in  7L-n  and  the  output 
values  are  defined  by  a  set  of  equations  that  express  the  output  in  terms  of  the  input.  Finally, 
process  H(z)  takes  as  input  a  sequence  z  G  Z-n,  and  outputs  the  message  1  followed  by  the 
messages  z  received  as  input,  possibly  truncated  to  a  prefix  z[<  n]  of  length  at  most  n  —  1,  so  that 
the  output  sequence  x  has  length  at  most  n. 

Process  composition.  Processes  are  composed  in  the  expected  way,  connecting  some  output 
variables  to  other  input  variables.  Here  we  use  the  convention  that  variable  names  are  used  to 
implicitly  specify  how  different  processes  are  meant  to  be  connected  together.3  Composing  two 
processes  together  yields,  in  turn,  another  process,  which  is  obtained  simply  combining  all  the 
equations.  We  often  refer  to  the  resulting  process  as  a  system  to  stress  its  structure  as  a  composition 
of  basic  processes.  However,  it  should  be  noted  that  both  a  process  and  a  system  are  objects  of 
the  same  mathematical  type,  namely  monotone  functions  described  by  systems  of  equations.  For 
example,  the  result  of  composing  G  and  H  from  Figure  6  yields  the  process  (G  |  H)  shown  in  Figure  7 
(left),  with  input  y  and  output  (w,x),  where  w  =  y  replicates  the  input  to  make  it  externally 
visible.  We  use  the  convention  that  by  default  processes  are  connected  by  private  channels,  not 
visible  outside  of  the  system.  This  is  modeled  by  turning  their  common  input/output  variables 
into  local  ones,  not  part  of  the  input  or  output  of  the  composed  system.  Of  course,  one  can  always 
either  override  this  convention  by  explicitly  listing  such  common  input/output  variables  as  part  of 
the  output,  or  bypass  it  by  duplicating  the  value  of  a  variable  as  done  for  example  by  process  G. 

‘sWe  stress  that  this  is  just  a  notational  convention,  and  there  are  many  other  syntactical  mechanisms  that  can  be 
used  to  specify  the  “wiring”  in  more  or  less  explicit  ways. 
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[G  |  H](y)  =  (■ w,x ): 

[G  |  H](y)  =  (w,x): 

z  =  y 

w  =  y 

w  =  y 

x[l]  =  1 

x[l]  =  1 

x\j  +  1]  =  z\j]  (j  <  n) 

x\j  +  l]  =  y\j] 

{. j  <  n) 

Figure  7:  Process  composition 


This  is  just  a  syntactical  convention,  and  several  other  choices  are  possible,  including  never  hiding 
variables  during  process  composition  and  introducing  a  special  projection  operator  to  hide  internal 
variables. 

Since  processes  formally  define  functions  (from  input  to  output  variables),  and  equations  are  just 
a  syntactic  method  to  specify  functions,  equations  can  be  simplified  without  affecting  the  process. 
Simplifications  are  easily  performed  by  substitution  and  variable  elimination.  For  example,  using 
the  first  equation  z  =  y,  one  can  substitute  y  for  z,  turning  the  last  equation  in  the  system  into 
x[i  +  1]  =  y[i\.  At  this  point,  the  local  variable  z  is  no  longer  used  anywhere,  and  its  defining 
equation  can  be  removed  from  the  system.  The  result  is  shown  in  Figure  7  (right).  We  remark 
that  the  two  systems  of  equations  shown  in  Figure  7  define  the  same  process:  they  have  the  same 
input  and  output  variables,  and  the  equations  define  precisely  the  same  function. 

Feedback  loops  and  recursive  equations.  Now  consider  the  composition  of  all  three  processes 
F,G,H  from  Figure  6.  Composition  can  be  performed  one  pair  at  a  time,  and  in  any  order,  e.g., 
as  [[F  |  G]  |  H]  or  [F  |  [G  |  H]].  Given  the  appropriate  mathematical  definitions,  it  can  be  easily 
shown  that  the  result  is  the  same,  independent  from  the  order  of  composition.  (This  is  clear  at 
the  syntactic  level,  where  process  composition  is  simply  defined  by  combining  all  the  equations 
together.  But  associativity  of  composition  can  also  be  proved  at  the  semantic  level,  where  the 
objects  being  combined  are  functions.)  So,  we  write  [F  |  G  |  H]  to  denote  the  result  of  composing 
multiple  processes  together,  shown  in  Figure  8  (left).  When  studying  multi-party  computation 
protocols,  one  is  naturally  led  to  consider  collections  of  processes,  e.g.,  Pi, . . . ,  Pn,  corresponding 
to  the  individual  programs  run  by  each  participant.  Given  a  collection  { P* } *  and  a  subset  of  indices 
/  C  {1, . . . ,  n},  we  write  P /  to  denote  the  composition  of  all  P*  with  i£/.  Similarly,  we  use  xa  or 
x[A]  to  denote  a  vector  indexed  by  i  €  A.  As  a  matter  of  notation,  we  also  use  xA  to  denote  the 
|  A\ -dimensional  vector  indexed  by  i  €  A  with  all  components  set  equal  to  x. 

The  system  [F  |  G  |  H]  has  no  inputs,  and  only  one  output  w.  More  interestingly,  the  result  of 
composing  all  three  processes  yields  a  recursive  system  of  equations,  where  y  is  a  function  of  x,  x  is 
a  function  of  z  and  z  is  a  function  of  y.  Before  worrying  about  solving  the  recursion,  we  can  simplify 
the  system.  A  few  substitutions  and  variable  eliminations  yield  the  system  in  Figure  8  (right).  The 
system  consists  of  a  single,  recursively  defined  output  variable  w.  The  recursive  definition  of  w  is 
easy  to  solve,  yielding  w[i]  =  i  +  1  for  i  <  n. 

2.2  Foundations:  Domain  theory  and  probabilistic  processes 

So  far,  equations  have  been  treated  in  an  intuitive  and  semi-formal  way,  and  in  fact  obtaining  an 
intuitive  and  lightweight  framework  is  one  of  our  main  objectives.  But  for  the  approach  to  be 
sound,  it  is  important  that  the  equations  and  the  variable  symbols  manipulated  during  the  design 
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[F  |  G  |  H]()  =  w: 

[F  |  G  |  H]()  =  w: 

y[i\  =  x[i]  +  1 

( i  <  n) 

to[l]  =  2 

z  =  y 

w\j  +  1]  =  w\j]  +  1 

(j  <  n) 

w  =  y 

x[l]  =  1 

x\j  +  1]  =  z\j] 

(. 3  <  n) 

Figure  8:  Example  of  recursive  process 


and  analysis  of  a  system  be  given  a  precise  mathematical  meaning.  Also,  we  want  to  consider  a 
more  general  model  of  processes  where  inputs  and  outputs  are  not  restricted  to  simple  sequences 
of  messages,  but  can  be  more  complex  objects,  including  probability  distributions.  This  requires 
us  to  introduce  some  further  formal  tools. 

The  standard  framework  to  give  a  precise  meaning  to  our  equations  is  that  of  domain  theory , 
a  well  established  area  of  computer  science  developed  decades  ago  to  give  a  solid  foundation  to 
functional  programming  languages  [27,  26,  14,  28,  1] .  Offering  a  full  introduction  to  domain  theory 
is  beyond  the  scope  of  this  paper,  but  in  order  to  reassure  the  reader  that  our  framework  is  sound, 
we  recall  the  most  basic  notions  and  illustrate  how  they  apply  to  our  setting. 

Domains  and  partial  orders.  Domains  are  a  special  kind  of  partially  ordered  set  satisfying 
certain  technical  properties.  We  recall  that  a  partially  ordered  set  (or  poset)  (X;<)  is  a  set  X 
together  with  a  reflexive,  transitive  and  antisymmetric  relation  <.  We  use  posets  to  model  the 
set  of  possible  histories  (or  behaviors)  of  communication  channels,  with  the  partial  order  relation 
corresponding  to  temporal  evolution.  For  example,  a  channel  that  allows  the  transmission  of  an 
arbitrary  number  of  messages  from  a  basic  set  M  (and  that  preserves  the  order  of  transmitted 
messages)  can  be  modeled  by  the  poset  ( M *;  <)  of  finite  sequences  of  elements  of  M  together 
with  the  prefix  partial  ordering  relation  <.  A  chain  x\  <  X2  <  . . .  <  xn  represents  a  sequence 
of  observations  at  different  points  in  time.4  In  this  paper  we  will  extensively  use  an  even  simpler 
poset  M_ l,  consisting  of  the  base  set  M  extended  with  a  special  “bottom”  element  _L,  and  the 
flat  partial  order  where  x  <  y  if  and  only  if  x  =  A  or  x  =  y.  The  poset  M±  is  used  to  model  a 
communication  channel  that  allows  the  transmission  of  a  single  message  from  M,  with  the  special 
value  _L  representing  a  state  in  which  no  message  has  been  sent  yet. 

The  Scott  topology  and  continuity.  Posets  can  be  endowed  with  a  natural  topology,  called 
the  Scott  topology,  that  plays  an  important  role  in  many  definitions.  In  the  case  of  posets  (X ;  <) 
with  no  infinite  chains,  closed  sets  can  be  simply  defined  as  sets  C  Cl  that  are  downward  closed, 
i.e.,  if  x  €  O  and  y  <  x,  then  y  €  C .  Intuitively,  a  set  is  closed  if  it  contains  all  possible  “pasts” 
that  lead  to  a  current  set  of  events.  Open  sets  are  defined  as  usual  as  the  complements  of  closed 
sets.  It  is  easy  to  see  that  the  standard  (topological5)  definition  of  continuous  function  f:X—*Y 
(according  to  the  Scott  topology  on  posets  with  no  infinite  chains)  boils  down  to  requiring  that  / 
is  monotone,  i.e.,  for  all  x,  y  €  X,  if  x  <  y  in  X,  then  f(x)  <  f(y )  in  Y.  In  the  case  of  posets  with 

4Domain  theory  usually  resorts  to  the  (related,  but  more  general)  notion  of  directed  set.  But  not  much  is  lost  by 
restricting  the  treatment  to  chains,  which  are  perhaps  more  intuitive  to  use  in  our  setting. 

5  We  recall  that  a  function  / :  X  — »  Y  between  two  topological  spaces  is  continuous  if  the  preimage  /~1(0)  of  any 
open  set  O  C  Y  is  also  open. 
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infinite  chains,  such  as  definitions  are  slightly  more  complex,  and  require  the  definition 

of  limits  of  infinite  chains.  For  any  poset  (X;  <)  and  subset  A  C  X,  x  G  X  is  an  upper  bound  on 
A  if  x  >  a  for  all  a  G  A.  The  value  x  is  called  the  least  upper  bound  of  A  if  it  is  an  upper  bound 
on  A,  and  any  other  upper  bound  y  satisfies  x  <  y.  Informally,  if  A  =  {m  |  i  =  1,2,...}  is  a  chain 
ai  <  02  <  03  <  . . .,  and  A  admits  a  least  upper  bound  (denoted  V  A),  then  we  think  of  \J  A  as  the 
limit  of  the  monotonically  increasing  sequence  A.  (In  our  setting,  where  the  partial  order  models 
temporal  evolution,  the  limit  corresponds  to  the  value  of  the  variable  describing  the  entire  channel 
history  once  the  protocol  has  finished  executing.)  All  Scott  domains  (and  all  posets  used  in  this 
paper)  are  complete  partial  orders  (or  CPO),  i.e.,  posets  such  that  all  chains  A  Cl  admit  a  least 
upper  bound.  CPOs  have  a  minimal  element  _L  =  V  0,  which  satisfies  _L  <  x  for  all  x  G  X.  Closed 
sets  C  C  X  of  arbitrary  CPOs  X  are  defined  by  requiring  C  to  be  also  closed  under  limits,  i.e.,  for 
any  chain  Z  C  C  it  must  be  V  Z  G  C.  (Open  sets  are  always  defined  as  the  complement  of  closed 
sets.)  Similarly,  continuous  functions  between  CPOs  /:  X  — >  Y  should  preserve  limits,  i.e.,  any 
chain  Z  C  X  must  satisfy  /(V  Z)  =  \J  f{Z). 

As  an  example,  we  see  that  (M*;  <)  is  not  a  CPO.  We  can  define  infinite  chains  A  of  successively 
longer  strings  (e.g.,  take  xt  =  0*  for  M  =  {0, 1})  such  that  no  limit  in  M*  exists  for  this  chain. 
However,  note  that  such  a  chain  always  defines  an  infinite  string  x*  G  M°°  which  is  such  that 
x*  <  y  holds  for  all  A  <  x.  Therefore,  the  poset  (M*  U  M°°;  <)  is  a  CPO.6  This  CPO  can  be  used 
to  model  processes  taking  input  and  output  sequences  of  arbitrary  length. 

Later  on,  we  often  use  generalizations  of  the  above  limit  notion,  called  the  join  and  the  meet , 
respectively.  For  a  set  Z  C  X,  let  Z^  =  {z'  G  X  :  \/z  £  Z  :  z  <  z'j  the  set  of  upper  bounds  on  Z. 
An  element  z*  G  Z ^  such  that  z*  <  z  for  all  if  it  exists,  is  called  the  join  of  Z  and  denoted 

\J  Z .  The  set  Z ^  and  the  meet  f\  Z  are  defined  symmetrically. 

Equational  descriptions  and  fixed  points.  We  can  now  provide  formal  justification  for  our 
equational  approach  given  above.  Note  that  CPOs  can  be  combined  in  a  variety  of  ways,  using 
common  set  operations,  while  preserving  the  CPO  structure.  For  example,  the  cartesian  product 
A  x  B  of  two  CPOs  is  a  CPO  with  the  component-wise  partial  ordering  relation.  Using  cartesian 
products,  one  can  always  describe  every  valid  system  of  equations  (as  informally  used  in  the  previous 
paragraphs  to  define  a  process  or  a  system)  as  the  definition  of  a  function  /  of  the  form 

f(z)  =  g(z,x )  where  x  =  h(z,x)  (1) 

for  some  internal  variable  x  and  bivariate  continuous'  functions  h(z,x )  and  g(z,x).  An  important 
property  of  CPOs  is  that  every  continuous  function  /:  X  — >  X  admits  a  least  fixed  point,  i.e., 
a  minimal  x  G  X  such  that  f(x)  =  x,  which  can  be  obtained  by  taking  the  limit  of  the  chain 
_L  <  /(A)  <  . . .  <  fn( _L)  <  . . .,  and  admits  an  intuitive  operational  interpretation:  starting  from 
the  initial  value  x  =  _L,  one  keeps  updating  the  value  x  f(x)  until  the  computation  stabilizes. 

Least  fixed  points  are  used  to  define  the  solution  to  recursive  equations  as  (1)  above  as  follows, 
and  to  show  that  it  is  always  defined,  proving  soundness  of  our  approach.  For  any  fixed  z,  the 
function  hz(x)  =  h(z,x )  is  also  continuous,  and  maps  X  to  itself.  So,  it  admits  a  least  fixed  point 
%z  =  V*  hz (A).  The  function  defined  by  (1)  is  precisely  f(z)  =  g(z,xz)  where  xz  is  the  least  fixed 
point  of  hz(x)  =  h(x,z).  It  is  a  standard  exercise  to  show  that  the  function  f(z)  so  defined  is  a 
continuous  function  of  z. 

’’In  fact,  usually,  one  can  define  M°°  to  be  exactly  the  set  of  limits  of  infinite  chains  from  M* . 

7  Continuity  for  bivariate  functions  is  defined  regarding  /  as  an  univariate  function  with  domain  Z  x  X. 
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Scott  domains  are  a  special  class  of  CPOs  satisfying  a  number  of  additional  properties  (techni¬ 
cally,  they  are  algebraic  bounded  complete  CPOs),  that  are  useful  for  the  full  development  of  the 
theory.  As  most  of  the  concepts  used  in  this  paper  can  be  fully  described  in  terms  of  CPOs,  we  do 
not  introduce  additional  definitions,  and  refer  the  reader  to  any  introductory  textbook  on  domain 
theory  for  a  formal  treatment  of  the  subject. 

Probabilistic  processes.  So  far,  our  theory  does  not  support  yet  the  definition  of  processes  with 
probabilistic  behavior.  Intuitively,  we  want  to  define  a  process  as  a  continuous  map  from  elements 
of  a  CPO  X  to  probability  distributions  over  some  CPO  Y.  We  will  now  discuss  how  to  define  the 
set  D(Y)  of  such  probability  distributions,  which  turns  out  to  be  a  CPO.  Our  approach  follows  [25]. 

Let  O(X)  be  the  open  sets  of  X,  and  B(X)  the  Borel  algebra  of  X ,  i.e.,  the  smallest  a- 
algebra  that  contains  O(X).  We  recall  that  a  probability  distribution  over  a  set  A  is  a  function 
p:  B(X)  — >  [0, 1]  that  is  countably  additive  and  has  total  mass  p(X)  =  1.  The  set  of  probability 
distributions  over  a  CPO  X,  denoted  D(X),  is  a  CPO  according  to  the  partial  order  relation  such 
that  p  <  q  if  and  only  if  p(A)  <  q(A )  for  all  open  sets  A  E  O(X).  This  partial  order  on  probability 
distributions  D(X)  captures  precisely  the  natural  notion  of  evolution  of  a  probabilistic  process:  the 
probability  of  a  closed  set  can  only  decrease  as  the  system  evolves  and  probability  mass  “escapes” 
from  it  into  the  future.  A  probabilistic  process  P  with  input  in  X  and  output  in  Y  is  described 
by  a  continuous  functions  from  X  to  D(Y)  that  on  input  an  element  x  E  X  produces  an  output 
probability  distribution  P(x)  E  D(Y)  over  the  set  Y. 

While  these  mathematical  definitions  may  seem  somehow  arbitrary  and  complicated,  we  reassure 
the  reader  that  they  correspond  precisely  to  the  common  notion  of  probabilistic  computation. 
For  example,  any  function  P :  X  — >  D(Y )  can  be  uniquely  extended  to  take  as  input  probability 
distributions.  The  resulting  function  P:  D(X)  D(Y),  on  input  a  distribution  D x,  produces 
precisely  what  one  could  expect:  the  output  probability  distribution  Dy  =  P (Dx)  is  obtained 
by  first  sampling  x  <—  D \  according  to  the  input  distribution,  and  then  sampling  the  output 
according  to  y  4—  P(x).  Moreover,  the  result  P:  D(X )  — >  D(Y )  is  continuous  according  to  the 
standard  topology  of  D(X)  and  D(Y). 

The  fact  that  a  distribution  D\  E  B(X)  and  a  function  /:  X  — >  D(Y )  can  be  combined 
to  obtain  an  output  distribution  Dy  allows  to  extend  our  equational  treatment  of  systems  to 
probabilistic  computations.  A  probabilistic  system  is  described  by  a  set  of  equations  similar  to  (1), 
except  that  h  is  a  continuous  function  from  Z  x  X  to  D(X),  and  we  write  the  equation  in  the  form 
x  •(—  h(z,  x)  to  emphasize  that  h(z ,  x)  is  a  probability  distribution  to  sample  from,  rather  than  a 
single  value.  For  any  fixed  z,  the  function  hz(x )  =  h(z,x)  is  continuous,  and  it  can  be  extended  to 
a  continuous  function  hz :  D(X)  — »  D(X).  The  least  fixed  point  of  this  function  is  a  probability 
distribution  Dz  E  D(X),  and  function  /  maps  the  value  z  to  the  distribution  g(z,Dz). 

Formally,  the  standard  mathematical  tool  to  give  the  equations  a  precise  meaning  is  the  use  of 
monads,  where  corresponds  to  the  monad  “bind”  operation.  We  reassure  the  reader  that  this 
is  all  standard,  well  studied  in  the  context  of  category  theory  and  programming  language  design, 
both  in  theory  and  practice,  e.g.,  as  implemented  in  mainstream  functional  programming  languages 
like  Haskell.  Rigorous  mathematical  definitions  to  support  the  definition  of  systems  of  probabilistic 
equations  can  be  easily  given  within  the  framework  of  domain  theory,  but  no  deep  knowledge  of 
the  theory  is  necessary  to  work  with  the  equations,  just  like  knowledge  of  denotational  semantics 
is  not  needed  to  write  working  computer  programs. 
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2.3  Multi-party  computation,  security  and  composability 

So  far,  we  have  developed  a  domain-theoretic  framework  to  define  processes,  their  composition, 
and  their  asynchronous  interaction.  We  still  need  to  define  what  it  means  for  such  a  system  to 
implement  a  multi-party  protocol,  and  what  it  means  for  such  a  protocol  to  securely  implement  some 
functionality.  Throughout  this  section,  we  give  definitions  in  the  deterministic  case  for  simplicity. 
The  definitions  extend  naturally  to  probabilistic  processes  by  letting  the  output  being  a  probability 
distribution  over  (the  product  of)  the  output  sets. 

We  model  secure  multi-party  computation  along  the  lines  described  in  the  introduction.  A 
secure  computation  task  is  modeled  by  an  n-party  functionality  F  that  maps  n  inputs  (x\, . . . ,  xn) 
to  n  outputs  (ij\ , . . .  ,yn )  in  the  deterministic  case,  or  to  a  distribution  on  a  set  of  n  outputs  in 
the  probabilistic  case.  Each  input  or  output  variable  is  associated  to  a  specific  domain  Xi/Yi , 
and  F  is  a  continuous  function  F :  ( X\  x  •  •  •  x  Xn)  — >•  (Yf  x  •  •  •  x  Yn),  typically  described  by  a 
system  of  domain  equations.  Each  pair  Xi/Yi  corresponds  to  the  input  and  output  channels  used 
by  user  i  to  access  the  functionality.  We  remark  that,  within  our  framework,  even  if  F  is  a  (pure) 
mathematical  function,  it  still  models  a  reactive  functionality  that  can  receive  inputs  and  produce 
outputs  asynchronously  in  multiple  rounds. 

Sometimes,  one  knows  in  advance  that  F  will  be  used  within  a  certain  context.  (For  example, 
in  the  next  section  we  will  consider  a  multicast  channel  that  is  always  used  for  broadcast,  i.e.,  in 
a  context  where  the  set  of  recepient  is  always  set  to  the  entire  group  of  users.)  In  these  settings, 
for  efficiency  reasons,  it  is  useful  to  consider  protocols  that  do  not  implement  the  functionality  F 
directly,  but  only  the  use  of  F  within  the  prescribed  context.  We  formalize  this  usage  by  introducing 
the  notion  of  a  protocol  implementing  an  interface  to  a  functionality.  An  interface  is  a  collection 
of  continuous  functions  /,; :  X[  x  Yi  — >  Xi  x  Yf,  where  X',  Y/  are  the  input  and  output  domain 
of  the  interface.  Combining  the  interface  I  =  I\  |  ...  |  In  with  the  functionality  F,  yields  a 
system  ( F  \  I )  with  inputs  X[, . . . .  X'n  and  outputs  Y(, . . . ,  Yf  that  offers  a  limited  access  to  F. 
The  standard  definition  of  (universally  composable)  security  corresponds  to  setting  /  to  the  trivial 
interface  where  X[  =  Xt,  Yf  =  Yt  and  each  R  to  the  identity  function  offering  direct  access  to  F. 

Ideal  functionalities  can  be  used  both  to  describe  protocol  problems,  and  underlying  communi¬ 
cation  models.  Let  N :  Si  x  . . .  x  Sn  — >  R\  x  . . .  x  Rn  be  an  arbitrary  ideal  functionality.  One  may 
think  of  N  as  modeling  a  communication  network  where  user  i  sends  Sj  €  Si  and  receives  rt  e  Ri, 
but  all  definitions  apply  to  arbitrary  N. 

A  protocol  implementing  an  interface  I  to  functionality  F  in  the  communication  model  N  is 
a  collection  of  functions  P\, ... .  Pn  where  Pi :  X[  X  Ri  -+  Y!  x  Si .  We  consider  the  execution  of 
protocol  P  in  a  setting  where  an  adversary  can  corrupt  a  subset  of  the  participants.  The  set  of 
corrupted  players  A  C  {1, . . .  ,n}  must  belong  to  a  given  family  A  of  allowable  sets,  e.g.,  all  sets 
of  size  less  than  n/2  in  case  security  is  to  be  guaranteed  only  for  honest  majorities.  We  can  now 
define  security. 

Definition  1  Protocol  P  securely  implements  interface  I  to  functionality  F  in  the  communication 
model  N  if  for  any  allowable  set  A  G  A  and  complementary  set  H  =  {1, . . .  ,n}  \  A,  there  is  a 
simulator  S :  S a  x  Y a  — >  Xa  x  Ra  such  that  the  systems  ( Ph  \  N)  and  (S  \  Ih  \  F)  are  equivalent, 
i.e.,  they  define  the  same  function. 

(Ph  |  N)  is  called  the  real  system,  and  corresponds  to  an  execution  of  the  protocol  in  which 
the  users  in  A  are  corrupted,  while  those  in  H  are  honest  and  follow  the  protocol.  It  is  useful  to 
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Figure  9:  The  protocol  (Pi, . . . ,  P4)  securely  implements  interface  (I±, . . . ,  I4)  to  functionality  F  in 
the  communication  model  N . 

illustrate  this  system  with  a  diagram.  See  Figure  9  (left).  We  see  from  the  diagram  that  the  real 
system  as  inputs  X'h,Sa  and  outputs  Y^,Ra-  In  the  ideal  setting,  when  the  adversary  corrupts 
the  users  in  A.  we  are  left  with  the  system  In  \  F  because  corrupted  users  are  not  bound  to  use 
the  intended  interface  I.  This  system  In  \  F  has  inputs  X'h,Xa  and  outputs  Y^,Ya-  In  order  to 
turn  this  system  into  one  with  the  same  inputs  and  outputs  as  the  real  one,  we  need  a  simulator 
of  type  S :  Sa  x  Ya  — >•  Xa  x  Ra •  When  we  compose  S  with  In  \  F  we  get  a  system  (S  \  In  \  F) 
with  the  same  input  and  output  variables  as  the  real  system  (Pn  \  N).  See  Figure  9  (right).  For 
the  protocol  to  be  secure,  the  two  systems  must  be  equivalent,  showing  that  any  attack  that  can 
be  carried  out  on  the  real  system  by  corrupting  the  set  A  can  be  simulated  on  the  ideal  system 
through  the  simulator. 

When  composing  protocols  together,  N  is  not  a  communication  network,  but  an  ideal  func¬ 
tionality  representing  a  hybrid  model.  In  this  setting,  we  say  that  protocol  P  accesses  N  through 
interface  J  =  (Ji, . . . ,  Jn)  if  each  party  runs  a  program  of  the  form  P,;  =  Jj  |  P'.  If  this  is  the 
case,  we  say  that  P  securely  implements  interface  I  to  functionality  P  through  interface  J  to 
communication  model  N. 

Composition  theorems  in  our  framework  come  essentially  for  free,  and  their  proof  easily  follow 
from  the  general  properties  of  systems  of  equations.  For  example,  we  have  the  following  rather 
general  composition  theorem. 

Theorem  1  Assume  P  securely  implements  interface  I  to  F  in  the  communication  model  N,  and 
Q  =  Qf  |  I  securely  implements  G  through  interface  I  to  F,  then  the  composed  protocol  Qf  \  P 
securely  implements  G  in  the  communication  model  N. 

The  simple  proof  is  similar  to  the  informal  argument  presented  in  the  introduction,  and  it  is 
left  to  the  reader  as  an  exercise.  The  composition  theorem  is  easily  extended  in  several  ways,  e.g., 
by  considering  protocols  Q  that  only  implement  a  given  interface  J  to  G,  and  protocols  P  that  use 
N  through  some  given  interface  J'. 
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Figure  10:  The  Broadcast  protocol 


3  Secure  Broadcast 

In  this  section  we  provide,  as  a  simple  case  study,  the  analysis  of  a  secure  broadcast  protocol 
(similar  to  Bracha’s  reliable  broadcast  protocol  [8]),  implemented  on  an  asynchronous  point-to- 
point  network.  We  proceed  in  two  steps.  In  the  first  step,  we  build  a  weak  broadcast  protocol, 
that  provides  consistency,  but  does  not  guarantee  that  all  parties  terminate  with  an  output.  In 
the  second  step,  we  use  the  weak  broadcast  protocol  to  build  a  protocol  achieving  full  security.  We 
present  the  two  steps  in  reverse  order,  first  showing  how  to  strengthen  a  weak  broadcast  protocol, 
and  then  implementing  the  weak  broadcast  on  a  point-to-point  network. 

3.1  Building  broadcast  from  weak  broadcast 

In  this  section  we  build  a  secure  broadcast  protocol  on  top  of  a  weak  broadcast  channel  and  a  point 
to  point  communication  network.  The  broadcast,  weak  broadcast,  and  communication  network 
are  described  in  Figure  10  (left).  The  broadcast  functionality  (BCast)  receives  a  message  x  from 
a  dealer,  and  sends  a  copy  yi  =  x  to  each  player.  The  weak  broadcast  channel  (WCast)  allows 
a  dishonest  dealer  to  specify  (using  a  boolean  vector  w  €  {_L,T}n)  which  subset  of  the  players 
will  receive  the  message.  Notice  that  the  functionality  WCast  described  in  Figure  10  is  in  fact 
a  multicast  channel,  that  allows  the  sender  to  transmit  a  message  to  any  subset  of  players  of  its 
choice.  We  call  it  a  weak  broadcast,  rather  than  multicast,  because  we  will  not  use  (or  implement) 
this  functionality  at  its  full  power:  the  honest  dealer  in  our  protocol  will  always  set  all  wt  =  T,  and 
use  WCast  as  a  broadcast  channel  BCast(x)  =  WCast(x,  Tn).  The  auxiliary  inputs  Wi  are  used 
only  to  capture  the  extra  power  given  to  a  dishonest  dealer  that,  by  not  following  the  protocol, 
may  restrict  the  delivery  of  the  message  x  to  a  subset  of  the  players.  This  will  be  used  in  the  next 
section  to  provide  a  secure  implementation  of  WCast  on  top  of  a  point  to  point  communication 
network. 

The  broadcast  protocol  is  very  simple  and  it  is  shown  in  Figure  10  (right).  The  dealer  simply 
uses  WCast  to  transmit  its  input  message  x  to  all  n  players  by  setting  w[i\  =  T  for  all  i  £  [n].  The 
players  have  then  access  to  a  network  functionality  Net  to  exchange  messages  among  themselves. 
The  program  run  by  the  players  makes  use  to  two  threshold  functions  t\  and  t2  each  taking  n 
inputs,  which  are  assumed  to  satisfy,  for  every  admissible  set  of  corrupted  players  A  C  {1, . . .  ,n} 
and  complementary  set  H  =  {1, . . .  ,n}  \  A,  input  vector  u,  and  value  x,  the  following  properties: 
ti(u[A],  (x)H)  =  t2(u[A],  (x)H)  =  x  (i.e.,  if  all  honest  players  agree  on  x,  then  ti,t2  output  x 
irrespective  of  the  other  values),  and  ti((±.)A,u[H])  >  ^((T)"4, u[H})  (i.e.,  for  any  set  of  values 
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Figure  11:  Security  of  broadcast  protocol  when  the  dealer  is  honest 

provided  by  the  honset  players,  t\  is  always  bigger  than  t2  regardless  of  the  other  values).  It  is 
easy  to  see  that  the  threshold  functions  U(u)  =  \/iSi=k.  /\  j£suj  satisfy  these  properties  provided 
|A|  <  k\  <  k>2  —  \  A\  <  n  —  2|A|,  which  in  paricular  requires  n  >  \A\  +  1. 

In  the  security  analysis,  we  distinguish  two  cases,  depending  on  whether  the  dealer  is  corrupt 
or  not. 

Honest  dealer.  First  we  consider  the  simple  case  where  the  adversary  corrupts  a  set  of  players 
Ac  {1, . . . ,  n},  and  the  dealer  behaves  honestly.  Let  H  =  {1, . . . ,  n}\A  be  the  set  of  honest  players. 
An  execution  of  the  protocol  when  players  in  A  are  corrupted  is  described  by  the  system  (Dealer  | 
Player[H]  |  WCast  |  Net)  with  input  (x,  s^)  and  output  (yn,  !Ja,  ta)  depicted  in  Figure  11  (left). 
Note  that  in  this  (and  the  following)  figures,  double  arrows  and  boxes  denote  parallel  processes  and 
channels.  Combining  the  defining  equations  of  Dealer,  P layer [h]  for  h  €  H,  WCast  and  Net, 
and  introducing  the  auxiliary  variables  Uh  =  x  V  t\{sA[h],  sn[h])  for  all  h  e  H,  we  get  that  for  any 
i,j  G  [n],  and  h  €  H  the  following  holds: 

rAi\  =  si\j] 

y[  =  x'  A  tn[i]  =  x  A  T  =  x 

sh[i\  =  yft  Vti(rfe[l],...,rft[n])  =  x\/ ti(sA[h\,sH[h])  =  uh 
Vh  =  t2{rh[l\, . . .  ,rh[n})  =  t2(sA[h\,  sH[h])  =  t2(sA[h\,uH) 
uh  =  x  V  ti(sA[h\,sH[h])  =  x  V  ti(sA{h),uH)  ■ 

The  last  equation  Uh  =  x  V  t\{sA{h),UH)  provides  a  recursive  definition  of  uh ,  which  can  be 
easily  solved  by  an  iterative  least  fix  point  computation:  starting  from  =  CH ,  we  get  = 
(x  V  ti(sA(h),  u^))H  =  xH ,  and  then  again  =  (x  V  ti(s^(/i),  u^))H  =  xH .  Therefore  the  least 
fix  point  is  uh  =  xH .  Substituting  uh  =  xH  in  the  previous  equations,  and  using  the  properties 
of  t.2,  we  see  that  the  system  of  equations  defined  by  (Dealer  |  Player[H]  |  WCast  |  Net)  is 
equivalent  to 

ra  =  ( sA[a\,xH )  (a  G  A) 

y'a  =  x  (a  G  A)  (2) 

yh  =  t2(sA[h\,xH)  =  x  (h  G  H) 

We  now  show  that  an  equivalent  system  can  be  obtained  by  combining  the  ideal  functionality 
BCast  with  a  simulator  Sim  as  in  Figure  11  (right).  The  simulator  takes  ( yA,SA )  as  input,  and 
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Sim  (yA,sA)  =  (VaA'a)- 

Sim  \x'  ,w,yA,sA)  =  ( x,rA,y'A )'■ 
uh  =  ( x '  A  w[h})  V  ti(sA[h],uH) 

(h  £  H) 

rA[a ]  =  s0  [A] 

(a  £  A) 

x  =  t2{sA[h\,uH)  ( h 

=  min  FI) 

rA[h\  =  yA 

{h  £  H) 

y'a  =  x'  A  w[a } 

(a  £  A) 

'K 

SR 

II 

SR 

ra[A]  =  sa[o] 

(a  £  A) 

ra[H]  =  uH 

(a  £  A) 

Figure  12:  Simulators  for  the  broadcast  protocol  when  the  dealer  is  honest  (left)  or  dishonest  (right) 


Figure  13:  Security  of  broadcast  protocol  when  the  dealer  is  corrupted. 


must  output  ( y'j\irA )  such  that  (Sim|  BCast)  is  equivalent  to  the  system  (Dealer)  Player[H]  | 
WCast|  Net)  specified  by  the  last  set  of  equations.  The  simulator  is  given  in  Figure  12  (left).  It 
is  immediate  to  verify  that  combining  the  equations  of  the  simulator  SlM  with  the  equations  yi  =  x 
of  the  ideal  broadcast  functionality,  and  eliminating  local  variables,  yields  a  system  of  equations 
identical  to  (2). 


Dishonest  dealer.  We  now  consider  the  case  where  both  the  dealer  and  a  subset  of  players 
A  are  corrupted.  As  before,  let  I?  =  {l,...,n}\A  be  the  set  of  honest  players.  The  system 
corresponding  to  a  real  execution  of  the  protocol  when  Dealer  and  Player[A]  are  corrupted  is 
(Player[H]  |  WCast|  Net),  mapping  (. x',w,sa )  to  (yn,  rA,  Ua)-  (See  Figure  13  (left).)  Using 
the  defining  equations  of  Player[H],  WCast  and  Net,  and  introducing  auxiliary  variables  Uh  = 
y’h  V  fi(r/,  [l], . . . ,  7’h [n] )  for  h  £  H,  we  get  the  following  set  of  equations: 


Vh  =  t2(rh[A],rh[H})  =  t2(sA[h],UH)  (h  £  H) 

y'a  =  x'  A  w[a\  ( a  £  A) 

ra[A]  =  sa[o]  (a  £  A)  (3) 

ra[H]  =  uH 

uh  =  (x'  A  w[h\)  V  ti(sA[h],uH)  ( heH ) 


This  time  the  simulator  Sim’  takes  input  {x' ,w,yA,  sa)  and  outputs  ( x,rA,  y'A )•  (See  Figure  13 
(right).)  With  these  inputs  and  outputs,  the  simulator  can  directly  set  all  variables  except  y^  just 
as  in  the  real  system  (3).  The  simulator  can  also  compute  the  value  yh ,  but  it  cannot  set  yh  directly 
because  this  variable  is  defined  by  the  ideal  functionality  as  y /,  =  x.  We  will  prove  that  all  variables 
yh  defined  by  (3)  take  the  same  value.  It  follows  that  the  simulator  can  set  x  =  yh  for  any  h  £  H, 
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and  the  system  (BCast,Sim’)  will  be  equivalent  to  (3)  (and  therefore  to  (Player[H]  |  WCast| 
Net)).  The  code  of  the  simulator  is  given  in  Figure  12  (right),  where  x  =  yu  is  arbitrarily  selected 
using  the  smallest  index  h  =  min  IT  (Any  other  choice  of  h  would  have  been  fine.) 

It  remains  to  prove  that  all  y /,  take  the  same  value.  By  antisymmetry,  it  is  enough  to  show 
that  ]ji  <  yj  for  all  i,j  £  H.  These  easily  follows  from  the  assumptions  on  t\,t2-  In  fact,  by 
monotonicity,  we  have 

Vi  =  t2(sA[i\,UH )  <  t2(JA,UH)  <  tl(±.A,UH)  <  tl(sA{j\,UH)  <  Uj. 

It  immeddiately  follows  that  yj  =  t2(sA[j\,  uh)  >  t2(sA[j],  (yi)H)  =  y%- 

3.2  Weak  broadcast 

In  this  section  we  show  how  to  implement  the  weak  broadcast  functionality  WCast  given  in 
Figure  10  to  be  used  within  the  BCast  protocol  discussed  in  the  previous  section,  and  analyze 
is  security.  We  recall  that  WCast  is  a  multicast  functionality  connecting  a  dealer  to  n  other 
parties,  which  allows  the  dealer  to  send  a  message  x'  to  a  subset  of  the  parties  specified  by  a 
vector  w  £  {_L,T}n.  We  stress  that  we  do  not  need  a  secure  implementation  of  WCast  in  its 
full  generality,  as  our  higher  level  broadcast  protocol  (BCast)  uses  WCast  is  a  rather  restricted 
way:  it  always  set  w  =  (T)n  and  transmits  x'  to  all  parties.  Accordingly,  we  give  a  protocol  that 
securely  implements  this  interface  to  WCast.  Formally,  the  dealer’s  interface  Int  takes  only  x'  as 
external  input,  and  passes  it  along  with  w  =  Tn  to  the  ideal  functionality.  The  other  parties  have 
unrestricted  access  to  the  ideal  functionality,  and  their  interface  is  the  identity  function  (or  empty 
system  of  equations). 

We  implement  interface  Int  to  WCast  on  top  of  a  point-to-point  communication  network 
similar  to  the  Net  functionality  described  in  Figure  10,  with  the  only  difference  that  here  also 
the  dealer  can  send  messages.  The  protocol  is  very  simple:  the  dealer  transmit  the  input  x'  to  all 
parties,  and  the  parties  retransmit  the  message  to  each  other.  Each  party  sets  its  output  using  a 
threshold  function  of  the  messages  received  by  the  other  parties.  The  equations  corresponding  to 
the  network  Net’,  interface  Int,  and  protocol  programs  Dealer,  Player[1],.  . .  ,Player[ii]  are 
given  in  Figure  14.  For  reference,  we  have  also  repeated  the  definition  of  BCast  from  Figure  10. 
The  function  t  is  assumed  to  satisfy  the  following  properties:  t.(u[A],  ( x)H )  =  x  (i.e.,  if  all  honest 
parties  agree  on  x,  then  the  output  is  x),  and,  moreover,  for  all  vectors  u,  u'  with  u[H]  =  u'[H],  we 
have  t(u)  =  t(u ')  or  t(u)  =  _L.  It  is  easy  to  see  that  the  threshold  function  t(u )  =  V|s|=fc  A  j^suJ 

satisfies  both  properties  for  k  >  Namely,  take  any  two  vectors  u,  u'  with  u[H]  =  u'[H\, 

assume  that  there  exist  sets  S  and  S'  such  that  Uj  =  x  for  all  j  £  S  and  u'-  =  y  all  j  £  S'.  Then, 
since  |<S  FI  S'  Fl  H\  >  2k  —  n  —  |A|  >  0,  and  hence  x  =  y. 

As  usual,  we  consider  two  cases  in  the  proof  of  security,  depending  on  whether  the  dealer  is 
corrupted  or  not. 

Dishonest  dealer.  It  is  convenient  to  consider  the  case  when  the  dealer  is  dishonest  first,  as  some 
of  the  derived  equations  will  be  useful  in  the  honest  dealer  case  too.  Beside  the  dealer,  the  players 
in  A  C  {1, ... ,  n}  are  corrupted,  and  we  let  H  =  {1, . . . ,  n}  \  A  be  the  set  of  honest  players.  We 
consider  the  real-world  system  (Player[/7]  |  Net’)  consisting  of  the  honest  partecipants  and  the 
network  Net’.  This  is  a  system  with  input  (.Sq,  s'a)  and  output  (y'H,r'A)  described  by  the  defining 
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WCast  (x',w)  = 


y\  =  x'  A  w[i\ 

(i  =  1, . 

.  ,n) 

Dealer(x')  =  (s'0): 

s'0[i]  =  x' 

(*  = 

i,- 

.  ,n 

Int: 

e 

1- 

II 

3 

PLAYER[i](r')  =  (a/.',  s'): 

(*  = 

i,- 

.  ,n 

o, 

ii 

V 

U  = 

i,- 

.  ,n 

Net’(s'0,  . . . ,  s'n)  =  (ri,.. 

•rtn): 

y'i  =  t(ril !],  •■■,»■<["]) 

ri\j]  =  Sj[i]  (i  =  1,  •  •  •  ,n;  j  =  0, . . .  ,n) 

Figure  14:  Weak  broadcast  protocol. 

equations  of  Player[/i]  for  h  £  H  and  Net’  given  in  Figure  14.  We  use  these  equations  to  express 
each  output  variable  of  the  system  in  terms  of  the  input  variables.  For  y'h  ( h  £  H)  we  have 

y'h  =  t(rh[1},--->rh[n\) 

=  t^h], . . . ,  s'n[h\) 

=  t(s'A[h\,s'H[h]) 

=  t(s'A[h\,r'H[ 0])  =  t{s'A[h\,  s'0[H]). 

For  the  other  output  variables  r'A[i]  we  distinguish  two  cases,  depending  on  whether  i  £  A.  For 
a  £  A,  we  immediately  get  r'A[a }  =  s'a[A ].  For  h  £  H,  we  have  r'A[h\  =  s'h[A\  =  ( r'h[0])A  =  (sgf/r])"4. 
The  resulting  system  is  given  by  the  following  equations 


=  s'a  [A] 

(a  £  A) 

(4) 

r'A[h\ 

II 

O  " 

0 h  £  H) 

(5) 

y'h 

=  t(s'A[h],  s'0[H}) 

(, h  £  H) 

(6) 

We  now  turn  to  the  simulator.  Recall  that  the  simulator  should  turn  the  system  defined  by 
WCast  into  one  equivalent  to  the  real  world  system.  To  this  end,  the  simulator  should  take  Sq,  s'a 
and  y'A  as  input  (from  the  external  environment  and  ideal  functionality  respectively),  and  output 
x',w  (to  the  ideal  functionality)  and  r'A  (to  the  external  environment).  Notice  that  the  simulator 
has  all  the  inputs  necessary  to  compute  the  values  defined  by  the  real  system,  and  in  fact  can  set  r'A 
using  just  those  equations.  The  only  difficulty  is  that  the  simulator  cannot  set  y'h  directly,  but  has 
only  indirect  control  over  its  value  through  the  ideal  functionality  and  the  variables  x' ,  w.  From  the 
properties  of  function  t,  we  know  that  all  y'h  =  t(s'A[h],  Sg[i?])  take  either  the  same  value  or  _L.  So, 
the  simulator  can  set  x1  to  this  common  value,  and  use  w  to  force  some  y'h  to  _L  as  appropriate.  The 
simulator  Sim’  is  given  in  Figure  15  (right).  It  is  easy  to  verify  that  (Sim’  |  WCast)  is  equivalent 
to  the  real  system. 

Honest  dealer.  In  this  case,  we  first  consider  the  real-world  system  (Dealer)  Player)./?] 
Net’)  consisting  of  the  dealer,  the  honest  participants  H  C  {l,...,n},  and  the  network  Net’. 
The  corrupted  parties  are  given  by  the  set  A  =  {1, . . . ,  n}  \  H.  This  is  a  system  with  input 
(a/,  s’A )  and  output  (y'H ,  r’A )  described  by  the  defining  equations  of  Dealer,  Player[/i]  for  h  6  H, 
and  Net’  given  in  Figure  14.  Notice  that  this  is  a  superset  of  the  equations  for  the  real-world 
system  (Player)!?]  |  Net’)  considered  in  the  dishonest  dealer  case.  So,  equations  (4),  (5)  and 
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(6)  are  still  valid.  Adding  the  equations  from  Dealer  and  using  the  properties  of  t  we  get  that 
y'h  =  t(s,A[h\,^0[ir\)  =  t(sA[h\,  ( x)H )  =  x.  Similarly,  for  h  £  H,  we  have  r'A[h\  =  (sof/i])"4  =  ( x)A . 
Finally,  we  know  from  (4)  that  r'A[a]  =  s'a [A] .  Combining  the  equations  together,  we  get  the 
following  real  system: 

y'h  =  x'  (h€H) 

rA[a\  =  salA ]  («  e  A ) 

rA[h ]  =  (x')A  ( h  £  H) 

We  now  move  to  the  simulator.  Recall  that  the  simulator  should  turn  the  systen  defined  by 
WCast  and  Int  into  one  equivalent  to  the  real  world  system.  To  this  end,  the  simulator  should 
take  y'A  and  s'A  as  input  (from  the  ideal  functionality  and  external  environment  respectively),  and 
output  r'A.  Notice  that  y'h  =  x'  in  the  real  system  is  defined  just  as  in  the  equations  for  the  ideal 
functionality  WCast  when  combined  with  the  (honest)  dealer  interface  Int.  (In  fact,  y'a  =  x'  also 
for  a  £  A.)  The  other  variables  r'A  can  be  easily  set  by  the  simulator  as  shown  in  Figure  15  (left). 
It  is  immediate  to  check  that  (Sim  |  Int  |  WCast)  is  equivalent  to  the  real  world  system. 

4  Verifiable  Secret  Sharing 

Let  Fj  [A]  be  the  set  of  all  polynomials  of  degree  at  most  t  over  a  finite  field  F  such  that8 
{0, 1, . . .  ,n}  CF.  We  consider  the  n-party  verifiable  secret  sharing  (VSS)  functionality  that  takes 
as  input  from  a  dealer  a  degree-f  polynomial  p  £  Ft[X]  and,  for  all  i  £  [n],  outputs  the  evaluation 
p(i)  to  the  i-th  party.  The  formal  definition  of  VSS:  F^ [X]  j_  i— )•  F™  is  given  in  Figure  18  (left), 
where  by  convention  _L(x)  =  _L  for  all  x. 

We  devise  a  protocol  implementing  the  VSS  functionality  on  top  of  a  point-to-point  network 
functionality  Net  defined  as  in  the  previous  section  that  allows  the  n  parties  to  exchange  elements 
from  F,  and  two  other  auxiliary  functionalities.  The  protocol  is  based  on  the  one  by  [5].  Even  though 
its  complexity  is  exponential  in  n,  we  have  chosen  to  present  this  protocol  due  to  its  simplicity. 
The  first  auxiliary  functionality  (Graph)  grants  all  parties  access  to  the  adjacency  matrix  of  an 
n- vertex  directed  graph  (with  loops),  where  each  party  i  £  [?z]  can  add  outgoing  edges  to  vertex 
i,  but  not  to  any  other  vertex  j  i.  Formally,  Graph:  {_L,  T}n  x  •  •  •  x  {_L,  T}n  — >  {_L,  T }nxn  is 
given  in  Figure  18  (center).  Setting  G[i,j\  =  T  is  interpreted  as  including  an  edge  from  i  to  j  in  the 
graph.  Graph  can  be  immediately  implemented  using  n  copies  of  a  broadcast  functionality,  where 
a  different  party  acts  as  the  sender  in  each  copy.  We  also  assume  the  availability  of  an  additional 

8This  assumption  is  not  really  necessary;  we  could  replace  {0, 1, . . . ,  n}  with  {0,  xi, . . . ,  xn}  for  any  n  distinct  field 
elements  xi, ...  ,xn. 


SiM’(y^,  So,  s'A)  =  {x',w,r'A): 

Sim  (y'A,s'A)  =  ( r'A ): 

r'A[a]  =  s'a[A] 

(a  £  A) 

r'A[a\  =  s'a  [A] 

(a  £  A) 

II 

O 

j? 

{h  £  H) 

II 

(h  £  H) 

x'  =  M  h&Ht(ys'A[h\-s\ oiH}) 
w[h]  =  (t{s'A[h],s'0[H})  >  _L) 

(h  £  H) 

Figure  15:  Real  world  systems  and  simulators  for  the  weak  broadcast  protocol.  Honest  dealer  case 
(left)  and  dishonest  dealer  case  (right) 
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1 

v'h 


(wCast) 


Figure  16:  Security  of  the  weak  multicast  protocol,  when  the  dealer  is  dishonest.  Real  world 
execution  on  the  left.  Simulated  attack  in  the  ideal  world  on  the  right. 


unidirectional  network  functionality  Net’:  (Fi[V]^_)n  — >  (Ft[X]^_)n  that  allows  the  VSS  dealer  to 
send  to  each  party  a  pair  of  polynomials  of  degree  at  most  t.  See  Figure  18  (right). 

The  VSS  protocol.  We  turn  to  the  actual  protocol  securely  implementing  the  VSS  functionality. 
We  first  define  some  auxiliary  functions.  For  any  subset  C  C  [??],  let  clique^  :  {_L,T}nxn  — >  {_L,T} 
be  the  function  cliquec-(G)  =  /\ijgCG[i,j].  This  function  is  clearly  monotone,  and  tests  if  C 
is  a  clique  in  G.  For  any  set  A,  we  equip  the  set  A±  with  a  monotone  equality-test  function 
eq  :  A±  x  Aj_  — >  {_L,T}  where  eq (x,y)  =  (x  =  y  ^  _L).  Monotonicity  follows  from  the  fact  that  all 
the  pairs  ( x,x )  such  that  eq (x,y)  =  T  are  maximal  elements  in  A±  x  A±. 

For  any  S  C  [n]  of  size  |5|  >  t  +  1,  and  r  G  F^,  let  interpolate^?-)  G  Ft[V]j_  be  the  (unique) 
polynomial  h  G  F*[V]  such  that  h(S')  =  r[S]  if  such  polynomial  exists,  and  interpolate^?")  =  _L 
otherwise.  For  C  C  [n],  define  also  a  monotone  function  interpolate^  :  Fj_n  — >  Fj[T]J  where 
interpolatef;!t(r)  =  \/{ interpolate^- (?') :  S  C  C,  |S|  =  \C\  —  t}.  Notice  that  interpolateC4(r)  =  _L  if 
no  interpolating  polynomial  exists,  while  interpolatect(r)  =  T  if  there  are  multiple  solutions.  Note 
that  if  n  >  At  +  1  and  \C\  >  n  —  t,  then  T  never  occurs:  Indeed,  let  S,S'  C  C  be  such  that 
|5|  =  |S'|/  =  \C\  —  t,  and  such  that  both  interpolate^?")  and  interpolate^?-)  differ  from  _L.  Since 
|5  n  S'\  >  \C\  —  2t  >  n  —  3t  >  t  +  1,  we  must  have  interpolate^?")  =  interpolate^?")  by  the  fact 
that  two  degree  t  polynomials  agreeing  at  t  +  1  points  are  necessarily  equal.  For  future  reference, 
this  is  summarized  by  the  following  lemma. 


Figure  17:  Security  of  the  weak  multicast  protocol,  when  the  dealer  is  honest.  Real  world  execution 
on  the  left.  Simulated  attack  in  the  ideal  world  on  the  right. 
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VSS(p)  =  (pi,...,pn) 

Pi  =  p(*)  (i  =  1, . . . ,  n) 


GRAPH(Gi,...,Gn) 

G[i,j]  =  Gi\j\ 


G: 

( i,3 


Net’(s')  = 

r'  =  s'[i]  (i  =  1, . . .  ,n)  . 


Figure  18:  The  VSS  functionality,  and  two  auxiliary  functions  used  to  realize  it. 


Dealer(p)  =  s': 

PLAYER[i](r',G,rj)  =  ( p^s^Gi ): 

(i  =  1, . . .  ,n) 

f  ^  "Pt{ p) 

(go  hi)  =  r\ 

s'[i]  =  (f(-,z),f(v))  (i  =  1, . . . ,  n) 

bB 

II 

(j  =  1, . . .  ,n) 

Gi[j]  =  eq(ri[j],hi(j)) 

(j  =  1, . . .  ,n) 

o i  =  V  cc[„ j  [cliquec(G)  A  interpolat eCt(rij] 

\C\>n—t 

Pi  =  Oj(0)  . 

Figure  19:  The  VSS  protocol. 


Lemma  1  Let  n,t  be  such  that  n  >  At  +  1,  and  let  C  C  [n]  be  such  that  \C\  >  n  —  t.  Then 
interpolate^ (r)  /  T  for  all  r  E  F"  • 

In  the  following,  denote  as  F^X,  Y]  the  set  of  polynomials  f  =  f  (X,Y)  in  F[X,  Y]  with  degree 
at  most  t  in  each  variable  X  and  Y .  For  any  p  E  Ft[X]j_,  let  Vt{ p)  the  (uniform  distribution  over 
the)  set  of  bivariate  polynomials  f  =  f(X,  Y)  E  F t[X,Y}±  of  degree  at  most  t  in  X  and  Y  such  that 
f (-,  0)  =  p.  (By  convention,  if  p  =  _L,  then  Pf(p)  =  {_L}.) 

The  protocol  consists  of  a  dealer  Dealer  which,  on  input  a  polynomial  p,  first  chooses  a 
random  bivariate  polynomial  f  in  Vt( p).  (This  is  the  only  random  choice  of  the  entire  protocol.) 
For  all  i  E  [n],  it  sends  the  two  polynomials  g *  =  and  h,  =  f (i,  •)  to  player  i,  with  the  usual 

convention  that  if  f  =  _L,  then  /(•,*)  =  /(*,•)  =  _L.  The  players  then  determine  whether  the 
polynomials  they  received  are  consistent.  This  is  achieved  by  having  each  honest  party  i  send  g, (j ) 
to  player  j,  who,  in  turn,  checks  consistency  with  hj(i).  (Note  that  if  the  polynomials  are  correct, 
then  g i(j)  =  hj('i)  =  f If  the  consistency  check  is  successful,  player  j  raises  the  entry  G[j,i\ 
to  T.  Each  honest  party  i  then  waits  for  a  clique  C  C  [n]  of  size  (at  least)  n  —  t  to  appear  in 
the  directed  graph  dehned  by  G,  and  computes  the  polynomial  o;  E  F*[X]J  obtained  interpolating 
the  values  g?  (i)  received  from  other  parties.  (Here  T  represents  an  error  condition  meaning  that 
multiple  interpolating  polynomials  were  found,  and  should  not  really  occur  in  actual  executions, 
as  we  will  show.)  As  soon  as  such  a  polynomial  is  found,  the  honest  party  terminates  with  output 
Pi  =  Oj(0).  A  formal  specification  is  given  in  Figure  19. 

In  the  following,  we  turn  to  proving  security  of  the  protocol.  The  analysis  consists  of  two  cases. 


Honest  dealer  security.  We  start  by  analyzing  the  security  of  the  above  protocol  in  the  case 
where  the  dealer  is  honest.  For  all  A  C  [n]  where  \A\  =  t  and  n  >  At  +  1,  define  H  =  [n]  \  A. 
When  the  players  in  the  set  A  are  corrupted  (and  thus  the  players  in  H  are  honest),  an  execution 
of  the  VSS  protocol  with  honest  dealer  is  given  by  the  system  (Dealer  |  Player[H]  |  Net’  |  Net 
|  Graph)  with  inputs  p:sa,Ga  and  outputs  rjo[A],rA,G,pH  given  in  Figure  20. 

We  proceed  by  combining  all  the  equations  together,  and  simplifying  the  result,  until  we  obtain 
a  system  of  equations  that  can  be  easily  simulated.  We  use  the  above  definition  of  the  system  to 
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[Dealer] 


Net 


Graph 


\ 

rHsH  GhJ 

\\1 

|player[H] 


4  / 

PH 


Figure  20:  Security  of  the  VSS  protocol  when  the  dealer  is  honest. 

obtain  the  following  equations  describing  (Dealer  |  Player[H]  |  Net’  |  Net  |  Graph):  For  any 
i,j  G  [n],  and  any  h  €  H,  a  G  A,  we  have 


f 

A  Vt(  p) 

G[h,j] 

=  oq(rh[j\,f{h,j)) 

ri 

G[a,j] 

=  Ga[j ] 

Vi  [h] 

=  f  (i,h) 

Oh 

=  \J  [cliquec(G)  A  interpolate^ (r^)] 

n[a ] 

=  sa  [i] 

CC[n],\C\>n-t 

Ph 

=  oh{0)  . 

For  convenience,  some  simplifications  have  already  been  made:  First  g,  and  hi  have  been  replaced 
by  f(-,  i)  and  f(i,  •),  respectively.  Second,  we  used  the  facts  that  r[  =  s'[i]  and  ri[h]  =  =  f(i.  h ) 

for  all  h  G  H  and  all  i  G  [??]  by  the  definitions  of  the  network  functionalities  Net’  and  Net.  Finally, 
we  have  set  values  for  G[-,  •]  according  to  the  protocol  specification  (for  honest  players)  and  the 
inputs  Ga  of  players  a  6  A. 

In  order  to  further  simplify  the  system,  we  claim  that  ph  =  p (h)  for  h  G  H.  If  p  =  _L,  then 
this  is  easy  to  see  because  f  =  _L  and  G[h,j]  =  eq(r/l[j],±)  =  _L.  Therefore,  we  necessarily  have 
cliquec.(G)  =  _L  for  all  C  C  [n]  with  \C\  >  n  —  t,  since  \C  Pi  H\  >  n  —  2t  >  0.  So,  we  only  need 
to  prove  the  claim  for  p  ^  _L.  Notice  that  the  equations  G[h,j]  =  eq(r^[j],  f(h,  j)),  depending  on 

whether  j  =  h!  G  H  or  j  =  a  G  A,  can  be  replaced  by  the  set  of  equations 

G[h,h']  =  eq(rh[h'],f(h,  h'))  =  eq(f(/i,  h')J(h,  h!))  =  T 

G[h,a]  =  eq(rh[a],f(h,a))  =  eq(s0[/i],f(/i, a))  . 

This  in  particular  implies  that  C  =  H  is  a  clique  of  size  at  least  n  —  t  in  the  graph  defined  by  G, 
i.e.,  we  have  clique^(G)  =  T  by  the  above.  Also,  since  rh[h']  =  f we  necessarily  have 

Oh  >  clique^(G)  A  interpolate^^ (r^)  =  T  A  f (h,  •)  =  f (h,  •) 

by  Lemma  1.  Now,  let  S  C  C  be  any  sets  such  that  \C\  >  n  —  t  and  \S\  =  \C\  —  t  >  n  —  2t.  Since 

o h(h')  =  f (h,  h!)  for  all  h!  G  H  and  \S  n  H\  >  n  —  3t  >  t  +  1,  we  have  interpolate^^)  >  f  (h,  •), 
and,  by  Lemma  1,  interpolate^?^)  =  f (h,  •).  This  proves  that  o h  =  interpolatect(r/l)  =  f(h,  •),  and 
ph  =  oh{0)  =  f(M)  =  P(h)- 

Summarizing,  the  real  system  is  described  by  the  following  set  of  equations: 
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f 

A  Vt{  p) 

G[h,  h'] 

=  (P  >  -L) 

vr 

1  a 

=  (f(-ra),f(a,-)) 

G[h,  a] 

=  eq(sa[/i],  f(h,  a)) 

ra[a!} 

=  sa '  [a] 

G[a,j] 

=  Ga[j] 

r  a  [h\ 

=  f(a,h ) 

Ph 

=  P {h)  ■ 

Notice  that  this  is  exactly  how  p^  is  defined  by  the  VSS  functionality.  So,  in  order  to  prove  security, 
it  is  enough  to  give  a  simulator  Sim  that  on  input  pa,  sa,  Ga,  outputs  G ,  va  and  r'A  as  defined  in 
the  above  system  of  equations.  See  Figure  20  (right). 

The  problem  faced  by  the  simulator  is  that  it  cannot  test  p  >  _L  and  generate  f  as  in  the  equations 
because  it  does  not  know  the  value  of  p,  rather  it  only  has  partial  information  pA  =  p(A).  The 
first  condition  p  >  T  is  easy  to  check  because  it  is  equivalent  to  pa  =  p(a)  >  _L  for  any  a  £  A.  In 
order  to  complete  the  simulation,  we  observe  that  the  equations  only  depend  on  the  2 1  polynomials 
/(•,  A)  and  f(A ,  •).  The  next  lemma  shows  that,  given  p(A),  the  polynomials  f(-,  A)  and  f(A,  •)  are 
statistically  independent  from  p,  and  their  distribution  can  be  easily  sampled. 

Lemma  2  Let  p  £  ¥t [X],  let  f  A-  Vt( p),  and  for  all  a  £  A,  let  ga  =  f(-,a)  and  ha  =  f(a,  •).  The 
conditional  distribution  of  (ga,ha)aeA  given  p(A)  is  statistically  independent  of  p,  and  it  can  be 
generated  by  the  following  algorithm  Samp(pA):  first  pick  random  polynomials  ha  £  Ft[Y]  indepen¬ 
dently  and  uniformly  at  random  subject  to  the  constraint  ha(0)  =  pa ■  Then,  pick  ga  £  Ft[A],  also 
independently  and  uniformly  at  random,  subject  to  the  constraint  ga(A)  =  hn(a). 


Using  the  algorithm  from  the  lemma,  we  obtain  the  following  simulator  SlM: 


Sim(pa, 

SA,  Ga)  =  (GA,  r A, 

rA ): 

G[h,h'} 

=  Ma&A^Pa  >  -L) 

(h,  h!  £  H) 

(gA,  hA)  £-  Samp  (pa) 

G[h,a } 

=  eq(sa[/t],  ga(/r)) 

( h  G  H ,  a  G  -A) 

r’A  = 

(g A,  h.4) 

G[a,j } 

=  Ga[j\ 

(a  £  A,j  £  [n]) 

ra[h \ 

=  h a(h) 

{h  G  H •)  a  G  A) 

ra[a'] 

=  saf  [a] 

(a,  a'  G  A) 

As  usual,  if  p  =  _L,  then  pa  =  A'4  and  by  convention  Samp  (pa)  =  {_L,_L}. 

Dishonest  dealer  security.  We  now  look  at  the  case  where  the  dealer  is  not  honest.  As  above, 
for  all  A  C  [n]  where  |A|  =  t  and  n  >  4f  +  1,  define  H  =  [n]  \  A.  When  the  players  in  the  set 
A  are  corrupted  (and  thus  the  players  in  H  are  honest),  an  execution  of  the  VSS  protocol  with 
dishonest  dealer  is  given  by  the  system  (Player[H]  |  Net’  |  Net  |  Graph)  with  inputs  s' ,  va,  Ga, 
and  outputs  r'A,  sa,  Ph  and  GA.  As  above,  we  start  with  an  equational  description  of  the  system, 
and  will  simplify  it  below  into  a  form  where  the  construction  of  a  corresponding  simulator  becomes 
obvious.  For  all  i,j  £  [n],  h,h!  £  H,  and  a  £  A,  we  have 


(g  h,K) 

=  Ah] 

G[h ,  a) 

=  eq(sa[h],  h/l(a)) 

T 

1  a 

=  A«] 

G[a,j] 

=  Ga[j } 

n[h} 

=  g  hfi) 

O  h 

=  \J  [cliquec(G)  A  interpolate^^)] 

rfia] 

=  Sa[*] 

CC[n],\C\>n-t 

G[h,  h’) 

=  eq(gH'(h),hh(ti)) 

Ph 

=  O/j(0)  . 
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Notice  that  we  have  already  undertaken  several  easy  simplification  steps,  defining  variables  which 
are  part  of  the  output  as  a  function  of  the  system  inputs  and  of  auxiliary  variables  g#,  h#,  o #, 
and  1'h-  Specifically,  to  obtain  the  above  equations  starting  from  the  original  system  specification, 
we  have  used  rt[j]  =  Sj [i] ,  where  s^[z]  =  g h(i),  together  with  r[  =  s' [i]  and  the  definition  of  G[h,i], 
distinguishing  between  the  cases  i  =  a  G  A  and  i  =  h!  G  H . 

Recall  that  in  order  to  prove  security,  we  need  to  give  a  simulator  Sim  with  input  s' ,  Ga,  sa,PA 
and  output  r'A,rA,GA  and  p  such  that  (VSS  |  Sim)  is  equivalent  to  the  above  system.  (See 
Figure  21.)  Notice  that  in  the  system  describing  a  real  execution,  all  variables  except  ph  (and 
intermediate  value  o h)  are  defined  as  functions  of  the  inputs  given  to  the  simulator.  So,  SlM  can  set 
all  these  variables  just  as  in  the  system  describing  the  real  execution.  The  only  difference  between 
a  real  execution  and  a  simulation  is  that  the  simulator  is  not  allowed  to  set  ph  directly.  Rather,  it 
should  specify  a  polynomial  p  G  Ft[V]j_,  which  implicitly  defines  ph  =  p (h)  through  the  equations 
of  the  ideal  VSS  functionality.  In  other  words,  in  order  to  complete  the  description  of  the  simulator 
we  need  to  show  that  SlM  can  determine  such  a  polynomial  p  based  on  its  inputs  s',Ga,sa,PA 
such  that  p(h)  equals  pt  as  defined  by  the  above  system  of  equations. 

Before  defining  p,  we  recall  the  following  lemma  whose  simple  proof  is  standard  and  omitted: 

Lemma  3  Let  S  be  such  that  |£|  >  t  + 1  and  let  {gh,  hh}hes  be  a  set  of  2- |Sj  polynomials  of  degree 
t.  Then,  g h{h')  =  h h'(h)  for  all  h,  h'  G  S  holds  if  and  only  if  there  exists  a  unique  polynomial 
f  G  Ft[X,  Y]  such  that  f(-,  h )  =  g h  and  f (h,  •)  =  h/,  for  all  h  G  S. 

For  T  C  H,  |T|  >  t  + 1,  define  interpolate2^(s/)  to  be  the  (unique)  polynomial  f  G  F*[V,  V]  such 
that  f(-,  h)  =  gh  and  f {h,  •)  =  h?,  for  all  h  G  T  (if  it  exists),  and  _L  otherwise  or  if  s'[h]  =  _L  for  some 
h  G  T.  Also,  given  C  C  [n],  define 

interpolate2c(s/)  =  \y/{interpolate2s(s')  :  S  C  C,  |5|  >  \C\  —  t}  . 

Note  that  since  \C\  >  n  —  t  and  n  >  At  +  1,  interpolate2c.(s/)  /  T.  Indeed,  for  any  two  S,  S'  C  C 
such  that  both  interpolate2i5(s/)  and  interpolate2S/(s/)  differ  from  _L,  we  have  \S  n  S'\  >  t  +  1  and 
hence  interpolate25(s')  =  interpolate2>5n>g/(.s/)  =  interpolate2>S/(s')  by  Lemma  3.  We  finally  define 
the  polynomial  p  =  /(•,  0),  where 

/  =  \/  cliquec(G)  A  interpolate2c.(s/)  .  (7) 

CC[n],\C\>n-t 


We  first  prove  that  p  <  T:  To  this  end,  assume  that  p  /  _L.  Then,  /  /  _L,  and  there  must  exist 
C  C  [n]  such  that  cliquec.(G)  =  T.  Let  S  =  C  n  H.  Note  that  for  all  h,  h!  G  S,  since  G[h ,  h']  =  T, 
it  must  be  that  h h(h')  =  gh'(h).  Therefore,  since  |5|  >  n  —  2t  >  2t  +  1,  by  Lemma  3,  there  exists  a 
unique  polynomial  f c  such  that  f(-,  h )  =  gh  and  f  (h,  •)  =  h/t  for  all  h  G  S,  and  by  the  above 

f c  =  interpolate2c(s/)  =  interpolate2c.nH(s/)  . 

Now  assume  that  there  exist  two  such  cliques  C  and  C' ,  with  S  =  C  n  H  and  S'  =  C'  n  H .  Then, 
since 

\SnS'\  =  |5|  +  \S'\  -  |5U5'|  >  2(|C|  -  |A|)  -  \H\  >n-3\A\  >  t  +  1  ,  (8) 

by  Lemma  3,  we  necessarily  have  f c  =  fc"  =  f- 
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Figure  21:  Security  of  the  VSS  protocol  when  the  dealer  is  dishonest. 


Finally,  it  is  easy  to  see  that  p(h)  =  ph  for  all  h  G  H.  Namely,  if  there  exists  C  C  [n] 
with  cliquec(G)  =  T,  then  r^hf]  =  g h'{h)  =  f(h,h')  for  all  h!  G  C  fl  H,  and  therefore  o h  = 
interpolate^^)  =  interpolate^#^)  =  f (h,  •),  and  thus  ph  =  o/2(0)  =  f(h,  0)  =  p (h). 

We  therefore  conclude  that  the  real  system  is  equivalent  to  (Sim|  VSS)  where  Sim  is  the 
simulator  defined  by  the  following  equations: 


Sim(s/,  Ga,  saiPa)  =  (r'A,rA,GA,  p): 

r'a  =  s' [a]  (a  G  A) 

ra[h }  =  g h(a)  ( h  G  H,a£  A) 

ra[a']  =  sa/[a]  (a,  a'  €  A) 

G[h,  h']  =  eq(gh/(/i),  hfe(h'))  {h,h!  G  H) 


G[h ,  a]  =  eq(sa[/i],  hh(a))  (a£  A,h<E  H) 
G[a,j]  =  Ga[j]  (a  G  A,  j  G  [n]) 

f  =  V  cc[„]  cliquec(G)  A  interpolate2c(.s') 

\C\>n—t 

P  =  f(-,0) 


5  Conclusions 

Recognizing  the  inherent  hardness  of  delivering  security  proofs  for  complex  cryptographic  protocols 
that  are  both  precise  and  intuitive  within  existing  security  frameworks,  we  have  presented  a  new 
framework  to  study  the  security  of  multi-party  computations  based  on  equational  descriptions 
of  interactive  processes.  Our  framework  allows  a  simple  and  intuitive,  yet  completely  formal, 
description  of  interactive  processes  via  sets  of  equations,  and  its  foundations  rely  on  tools  from 
programming-language  theory  and  domain  theory.  Beyond  its  simplicity,  our  framework  completely 
avoids  explicit  addressing  of  non-determinism  within  cryptographic  security  proofs,  making  security 
proofs  a  matter  of  simple  equational  manipulations  over  precise  mathematical  structures.  As  a 
case  study,  we  have  presented  simple  security  analyses  of  (variants  of)  two  classical  asynchronous 
protocols  within  our  framework,  Bracha’s  broadcast  protocol  [8]  and  the  Ben-Or,  Canetti,  Goldreich 
VSS  protocol  [5]. 

We  are  convinced  that  our  work  will  open  up  the  avenue  to  several  directions  for  future  work. 
First  off,  while  the  results  in  this  paper  are  presented  for  the  special  case  of  perfect  security,  a  natural 
next  step  is  to  extend  the  framework  to  statistical  and  even  computational  security.  Moreover,  while 
the  expressiveness  of  our  framework  (i.e.,  the  monotonicity  restrictions  on  protocols)  remains  to 
be  thoroughly  investigated,  most  distributed  protocols  we  examined  so  far,  seemed  to  admit  a 
representation  within  our  framework,  possibly  after  minor  modifications  which  often  resulted  in  a 
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simpler  protocol  description.  For  this  reason,  our  thesis  is  that  this  holds  true  for  all  protocols 
of  interest,  and  that  non-monotonicity,  as  a  source  of  unnecessary  complexity  and  proof  mistakes, 
should  be  avoided  whenever  possible. 
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Abstract.  The  wiretap  channel  is  a  setting  where  one  aims  to  provide 
information-theoretic  privacy  of  communicated  data  based  solely  on  the 
assumption  that  the  channel  from  sender  to  adversary  is  “noisier”  than 
the  channel  from  sender  to  receiver.  It  has  developed  in  the  Information 
and  Coding  (I&C)  community  over  the  last  30  years  largely  divorced  from 
the  parallel  development  of  modern  cryptography.  This  paper  aims  to 
bridge  the  gap  with  a  cryptographic  treatment  involving  advances  on  two 
fronts,  namely  definitions  and  schemes.  On  the  first  front  (definitions), 
we  explain  that  the  mis-r  definition  in  current  use  is  weak  and  propose 
two  alternatives:  mis  (based  on  mutual  information)  and  ss  (based  on 
the  classical  notion  of  semantic  security).  We  prove  them  equivalent, 
thereby  connecting  two  fundamentally  different  ways  of  defining  privacy 
and  providing  a  new,  strong  and  well-founded  target  for  constructions. 
On  the  second  front  (schemes),  we  provide  the  first  explicit  scheme  with 
all  the  following  characteristics:  it  is  proven  to  achieve  both  security  (ss 
and  mis,  not  just  mis-r)  and  decodability;  it  has  optimal  rate;  and  both 
the  encryption  and  decryption  algorithms  are  proven  to  be  polynomial¬ 
time. 


1  Introduction 

The  wiretap  channel  is  a  setting  where  one  aims  to  obtain  information-theoretic 
privacy  under  the  sole  assumption  that  the  channel  from  sender  to  adversary  is 
“noisier”  than  the  channel  from  sender  to  receiver.  Introduced  by  Wyner,  Csiszar 
and  Korner  in  the  late  seventies  [41,14],  it  has  developed  in  the  Information 
and  Coding  (I&C)  community  largely  divorced  from  the  parallel  development  of 
modern  cryptography.  This  paper  aims  to  bridge  the  gap  with  a  cryptographic 
treatment  involving  advances  on  two  fronts. 

The  first  is  definitions.  We  explain  that  the  security  definition  in  current  use, 
that  we  call  mis-r  (mutual-information  security  for  random  messages)  is  weak 
and  insufficient  to  provide  security  of  applications.  We  suggest  strong,  new  def¬ 
initions.  One,  that  we  call  mis  (mutual-information  security),  is  an  extension 
of  mis-r  and  thus  rooted  in  the  I&C  tradition  and  intuition.  Another,  semantic 
security  (ss),  adapts  the  cryptographic  “gold  standard”  emanating  from  [19]  and 
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universally  accepted  in  the  cryptographic  community.  We  prove  the  two  equiv¬ 
alent,  thereby  connecting  two  fundamentally  different  ways  of  defining  privacy 
and  providing  a  new,  strong  and  well-founded  target  for  constructions. 

The  second  is  schemes.  Classically,  the  focus  of  the  I&C  community  has 
been  proofs  of  existence  of  mis-r  schemes  of  optimal  rate.  The  proofs  are  non¬ 
constructive  so  that  the  schemes  may  not  even  be  explicit  let  alone  polynomial 
time.  Recently,  there  has  been  progress  towards  explicit  mis-r  schemes  [30, 29, 
21].  We  take  this  effort  to  its  full  culmination  by  providing  the  first  explicit 
construction  of  a  scheme  with  all  the  following  characteristics:  it  is  proven  to 
achieve  security  (not  just  mis-r  but  ss  and  mis)  over  the  adversary  channel;  it  is 
proven  to  achieve  clecodabilty  over  the  receiver  channel;  it  has  optimal  rate;  and 
both  the  encryption  and  decryption  algorithms  are  proven  to  be  polynomial¬ 
time. 

Today  the  possibility  of  realizing  the  wiretap  setting  in  wireless  networks  is 
receiving  practical  attention  and  fueling  a  fresh  look  at  this  area.  Our  work  helps 
guide  the  choice  of  security  goals  and  schemes.  Let  us  now  look  at  all  this  in 
more  detail. 

The  wiretap  MODEL.  The  setting  is  depicted  in  Fig.  1.  The  sender  applies 
to  her  message  M  a  randomized  encryption  algorithm  £:  {0,  l}m  — >  {0, 1}°  to 
get  what  we  call  the  sender  ciphertext  X  4—  *  £  ( M ) .  This  is  transmitted  to  the 
receiver  over  the  receiver  channel  ChR  so  that  the  latter  gets  a  receiver  ciphertext 
Y  4— s  ChR(X)  which  he  decrypts  via  algorithm  V  to  recover  the  message.  The 
adversary’s  wiretap  is  modeled  as  another  channel  ChA  and  she  accordingly  gets 
an  adversary  ciphertext  Z  4—*  ChA(X)  from  which  she  tries  to  glean  whatever  she 
can  about  the  message. 

A  channel  is  a  randomized  function  specified  by  a  transition  probability 
matrix  W  where  W\x ,  y]  is  the  probability  that  input  x  results  in  output  y. 
Here  x,  y  are  strings.  The  canonical  example  is  the  Binary  Symmetric  Channel 
BSCp  with  crossover  probability  p  <  1/2  taking  a  binary  string  x  of  any  length 
and  returning  the  string  y  of  the  same  length  formed  by  flipping  each  bit  of  x 
independently  with  probability  p.  For  concreteness  and  simplicity  of  exposition 
we  will  often  phrase  discussions  in  the  setting  where  ChR,  ChA  are  BSCs  with 
crossover  probabilities  pr,pa  <  1/2  respectively,  but  our  results  apply  in  much 
greater  generality.  In  this  case  the  assumption  that  ChA  is  “noisier”  than  ChR 
corresponds  to  the  assumption  that  pn  <  pa-  This  is  the  only  assumption  made: 
the  adversary  is  computationally  unbounded,  and  the  scheme  is  keyless,  meaning 
sender  and  receiver  are  not  assumed  to  a  priori  share  any  information  not  known 
to  the  adversary. 

The  setting  now  has  a  literature  encompassing  hundreds  of  papers.  (See  the 
survey  [28]  or  the  book  [6].)  Schemes  must  satisfy  two  conditions,  namely  de¬ 
coding  and  security.  The  decoding  condition  asks  that  the  scheme  provide  error- 
correction  over  the  receiver  channel,  namely  the  limit  as  k  — >  oo  of  the  maximum, 
over  all  M  of  length  m,  of  Pr[D(ChR(£(M)))  ^  M],  is  0,  where  k  is  an  underlying 
security  parameter  of  which  to,  c  are  functions.  The  original  security  condition 
of  [41]  was  that  lim^oo  I(M;  ChA(£ (M))/to  =  0  where  the  message  random  vari- 
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Fig.  1.  Wiretap  channel  model.  See  text  for  explanations. 


able  M  is  uniformly  distributed  over  {0,  l}m  and  I(M;  Z)  =  H(M)  —  H(M  |  Z)  is 
the  mutual  information.  This  was  critiqued  by  [31, 32]  who  put  forth  the  stronger 
security  condition  now  in  use,  namely  that  lim^oo  I(M;  ChA(£ (M))  =  0,  the  ran¬ 
dom  variable  M  continuing  to  be  uniformly  distributed  over  {0,1  }m.  The  rate 
Rate(£)  of  a  scheme  is  m/c. 

The  literature  has  focused  on  determining  the  maximum  possible  rate.  (We 
stress  that  the  maximum  is  over  all  possible  schemes,  not  just  ones  that  are 
explicitly  given  or  polynomial  time.)  Shannon’s  seminal  result  [37]  says  that  if 
we  ignore  security  and  merely  consider  achieving  the  decoding  condition  then  this 
optimal  rate  is  the  receiver  channel  capacity,  which  in  the  BSC  case  is  1  —  h^ipR) 
where  /i2  is  the  binary  entropy  function  /i2(p)  =  —  plg(p)  —  (1  —  p)lg(l  —  p).  He 
gave  non-constructive  proofs  of  existence  of  schemes  meeting  capacity.  Coming 
in  with  this  background  and  the  added  security  condition,  it  was  natural  for  the 
wiretap  community  to  follow  Shannon’s  lead  and  begin  by  asking  what  is  the 
maximum  achievable  rate  subject  to  both  the  security  and  decoding  conditions. 
This  optimal  rate  is  called  the  secrecy  capacity  and,  in  the  case  of  BSCs,  equals 
the  difference  (1  —  /i2(Pfl))  —  (1  —  ^(pa))  =  ^2 (pa)  —  h2(pfi)  in  capacities  of 
the  receiver  and  adversary  channels.  Non-constructive  proofs  of  the  existence  of 
schemes  with  this  optimal  rate  were  given  in  [41, 14,  7].  Efforts  to  obtain  explicit, 
secure  schemes  of  optimal  rate  followed  [40,33,30,29,21], 

Context.  Practical  interest  in  the  wiretap  setting  is  escalating.  Its  proponents 
note  two  striking  benefits  over  conventional  cryptography:  (1)  no  computational 
assumptions,  and  (2)  no  keys  and  hence  no  key  distribution.  Item  (1)  is  at¬ 
tractive  to  governments  who  are  concerned  with  long-term  security  and  worried 
about  quantum  computing.  Item  (2)  is  attractive  in  a  world  where  vulnerable, 
low-power  devices  are  proliferating  and  key-distribution  and  key-management 
are  unsurmountable  obstacles  to  security.  The  practical  challenge  is  to  realize  a 
secrecy  capacity,  meaning  ensure  by  physical  means  that  the  adversary  channel 
is  noisier  than  the  receiver  one.  The  degradation  with  distance  of  radio  commu¬ 
nication  signal  quality  is  the  basis  of  several  approaches  being  investigated  for 
wireless  settings.  Government-sponsored  Ziva  Corporation  [42]  is  using  optical 
techniques  to  build  a  receiver  channel  in  such  a  way  that  wiretapping  results  in  a 
degraded  channel.  A  program  called  Physical  Layer  Security  aimed  at  practical 
realization  of  the  wiretap  channel  is  the  subject  of  books  [6]  and  conferences  [24], 
All  this  activity  means  that  schemes  are  being  sought  for  implementation.  If  so, 
we  need  privacy  definitions  that  yield  security  in  applications,  and  we  need  con- 
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structive  results  yielding  practical  schemes  achieving  privacy  under  these  defini¬ 
tions.  This  is  what  we  aim  to  supply. 

Definitions.  A  security  metric  xs  associates  to  encryption  function  £:  {0,  l}m 
— >•  {0, 1}C  and  adversary  channel  ChA  a  number  Advxs(£;  ChA)  that  measures 
the  maximum  “advantage”  of  an  adversary  in  breaking  the  scheme  under  metric 
xs.  For  example,  the  metric  underlying  the  current,  above-mentioned  security 
condition  is  Advmis_r(£;  ChA)  =  I(M;  ChA(£(M)))  where  M  is  uniformly  dis¬ 
tributed  over  {0,  l}m.  We  call  this  the  mis-r  (mutual-information  security  for 
random  messages)  metric  because  messages  are  assumed  to  be  random.  From 
the  cryptographic  perspective,  this  is  extraordinarily  weak,  for  we  know  that 
real  messages  are  not  random.  (They  may  be  files,  votes  or  any  type  of  struc¬ 
tured  data,  often  with  low  entropy.  Contrary  to  a  view  in  the  I&C  community, 
compression  does  not  render  data  random,  as  can  be  seen  from  the  case  of 
votes,  where  the  message  space  has  very  low  entropy.)  This  leads  us  to  sug¬ 
gest  a  stronger  metric  that  we  call  mutual-information  security ,  defined  via 
Advmis(£;  ChA)  =  maxM  I(M;  ChA(£(M)))  where  the  maximum  is  over  all  ran¬ 
dom  variables  M  over  {0,  l}m,  regardless  of  their  distribution. 

At  this  point,  we  have  a  legitimate  metric,  but  how  does  it  capture  privacy? 
The  intuition  is  that  it  is  measuring  the  difference  in  the  number  of  bits  required 
to  encode  the  message  before  and  after  seeing  the  ciphertext.  This  intuition 
is  alien  to  cryptographers,  whose  metrics  are  based  on  much  more  direct  and 
usage-driven  privacy  requirements.  Cryptographers  understand  since  [19]  that 
encryption  must  hide  all  partial  information  about  the  message,  meaning  the 
adversary  should  have  little  advantage  in  computing  a  function  of  the  message 
given  the  ciphertext.  (Examples  of  partial  information  about  a  message  include 
its  first  bit  or  even  the  XOR  of  the  first  and  second  bits.)  The  mis-r  and  mis 
metrics  ask  for  nothing  like  this  and  are  based  on  entirely  different  intuition.  We 
extend  Goldwasser  and  Micali’s  semantic  security  [19]  definition  to  the  wiretap 
setting,  defining  Advss(£;ChA)  as 

max  ^maxPr[A(ChA(£ (M)))  =  /(M)]  —  mjixPr[<S(m)  =  /(M)] 

Within  the  parentheses  is  the  maximum  probability  that  an  adversary  A ,  given 
the  adversary  ciphertext,  can  compute  the  result  of  function  /  on  the  message, 
minus  the  maximum  probability  that  a  simulator  S  can  do  the  same  given  only 
the  length  of  the  message.  We  also  define  a  distinguishing  security  (ds)  metric 
as  an  analog  of  indistinguishability  [19]  via 

Advds(£;  ChA)  =  max  2  PrL4(M0,  Mu  ChA (£(Mb)))  =  b]  -  1 

A, M0, Mi 

where  challenge  bit  b  is  uniformly  distributed  over  {0, 1}  and  the  maximum  is 
over  all  m-bit  messages  Mo,  Mi  and  all  adversaries  A.  For  any  metric  xs,  we  say 
£  provides  XS-security  over  ChA  if  lim^.^oo  Advxs(£ ;  ChA)  =  0. 

Relations.  The  mutual  information  between  message  and  ciphertext,  as  mea¬ 
sured  by  mis,  is,  as  noted  above,  the  change  in  the  number  of  bits  needed  to  en¬ 
code  the  message  created  by  seeing  the  ciphertext.  It  is  not  clear  why  this  should 


13.  Semantic  Security  for  the  Wiretap  Channel 


Semantic  Security  for  the  Wiretap  Channel 


5 


MIS  t  DS 


1 

Cds  ^  \/2Cmis 

Theorem  5 

2 

Cmis  ^  2€ds  lg  T~~ 
fcds 

Theorem  8 

3 

TJ 

VI 

Theorem  1 

4 

Cds  ^  2Css 

Theorem  1 

MIS-R  SS 


Fig.  2.  Relations  between  notions.  An  arrow  A  — >  B  is  an  implication,  mean¬ 
ing  every  scheme  that  is  A-secure  is  also  B-secure,  while  a  barred  arrow  A  -/>  B 
is  a  separation,  meaning  that  there  is  a  A-secure  scheme  that  is  not  B-secure.  If 
£ :  {0,  l}m  — »  {0, 1}C  is  the  encryption  algorithm  and  ChA  the  adversary  channel,  we 
let  exs  =  Advxs(£ ;  ChA).  The  table  then  shows  the  quantitative  bounds  underlying  the 
annotated  implications. 


measure  privacy  in  the  sense  of  semantic  security.  Yet  we  are  able  to  show  that 
mutual- information  security  and  semantic  security  are  equivalent,  meaning  an 
encryption  scheme  is  MIS-secure  if  and  only  if  it  is  SS-secure.  Fig.  2  summarizes 
this  and  other  relations  we  establish  that  between  them  settle  all  possible  rela¬ 
tions.  The  equivalence  between  SS  and  DS  is  the  information-theoretic  analogue 
of  the  corresponding  well-known  equivalence  in  the  computational  setting  [19, 
3].  As  there,  however,  it  brings  the  important  benefit  that  we  can  now  work  with 
the  technically  simpler  DS.  We  then  show  that  MIS  implies  DS  by  reducing  to 
Pinsker’s  inequality.  We  show  conversely  that  DS  implies  MIS  via  a  general  re¬ 
lation  between  mutual  information  and  statistical  distance.  As  Fig.  2  indicates, 
the  asymptotic  relations  are  all  underlain  by  concrete  quantitative  and  poly¬ 
nomial  relations  between  the  advantages.  On  the  other  hand,  we  show  that  in 
general  MIS-R  does  not  imply  MIS,  meaning  the  former  is  strictly  weaker  than 
the  latter.  We  do  this  by  exhibiting  an  encryption  function  £  and  channel  ChA 
such  that  £  is  MIS-R-secure  relative  to  ChA  but  MIS-insecure  relative  to  ChA. 
Furthermore  we  do  this  for  the  case  that  ChA  is  a  BSC. 

Our  scheme.  We  provide  the  first  explicit  scheme  with  all  the  following  char¬ 
acteristics  over  a  large  class  of  adversary  and  receiver  channels  including  BSCs: 
(1)  It  is  DS  (hence  SS,  MIS  and  MIS-R)  secure  (2)  It  is  proven  to  satisfy  the 
decoding  condition  (3)  It  has  optimal  rate  (4)  It  is  fully  polynomial  time,  mean¬ 
ing  both  encryption  and  decryption  algorithms  run  in  polynomial  time  (5)  the 
errors  in  the  security  and  decoding  conditions  do  not  just  vanish  in  the  limit  but 
at  an  exponential  rate.  Our  scheme  is  based  on  three  main  ideas:  (1)  the  use  of 
invertible  extractors  (2)  analysis  via  smooth  min-entropy,  and  (3)  a  (surprising) 
result  saying  that  for  certain  types  of  schemes,  security  on  random  messages 
implies  security  on  all  messages. 

Recall  that  the  secrecy  capacity  is  the  optimal  rate  for  MIS-R  schemes  meet¬ 
ing  the  decoding  condition  and  in  the  case  of  BSCs  it  equals  hApA)  —  h-iipR.)- 
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Since  DS-security  is  stronger  than  MIS-R  security,  the  optimal  rate  could  in 
principle  be  smaller.  Perhaps  surprisingly,  it  isn’t:  for  a  broad  class  of  channels 
including  symmetric  channels,  the  optimal  rate  is  the  same  for  DS  and  MIS-R 
security.  This  follows  by  applying  our  above-mentioned  result  showing  that  MIS- 
R  implies  MIS  for  certain  types  of  schemes  and  channels,  to  known  results  on 
achieving  the  secrecy  capacity  for  MIS-R.  Thus,  when  we  say,  above,  that  our 
scheme  achieves  optimal  rate,  this  rate  is  in  fact  the  secrecy  capacity. 

A  common  misconception  is  to  think  that  privacy  and  error-correction  may 
be  completely  de-coupled,  meaning  one  would  first  build  a  scheme  that  is  secure 
when  the  receiver  channel  is  noiseless  and  then  add  an  ECC  on  top  to  meet  the 
decoding  condition  with  a  noisy  receiver  channel.  This  does  not  work  because 
the  error-correction  helps  the  adversary  by  reducing  the  noise  over  the  adversary 
channel.  The  two  requirements  do  need  to  be  considered  together.  Nonetheless, 
our  approach  is  modular,  combining  (invertible)  extractors  with  existing  ECCs 
in  a  blackbox  way.  As  a  consequence,  any  ECC  of  sufficient  rate  may  be  used. 
This  is  an  attractive  feature  of  our  scheme  from  the  practical  perspective.  In 
addition  our  scheme  is  simple  and  efficient. 

Our  claims  (proven  DS-security  and  decoding  with  optimal  rate)  hold  not 
only  for  BSCs  but  for  a  wide  range  of  receiver  and  adversary  channels. 

A  CONCRETE  instantiation.  As  a  consequence  of  our  general  paradigm,  we 
prove  that  the  following  scheme  achieves  secrecy  capacity  for  the  BSC  setting 
with  pr  <  pa  <  1/2.  For  any  ECC  E:  (0,  l}fe  {0,1}"  such  that  k  «  (1  — 
h2{PR))-n  (such  ECCs  can  be  built  e.g.  from  polar  codes  [2]  or  from  concatenated 
codes  [18])  and  a  parameter  t  >  1,  our  encryption  function  £,  on  input  an  m-bit 
message  M,  where  m  =  b-t  and  b  ss  (h^ipA)  —  h^ipn))  ■  n,  first  selects  a  random 
fc-bit  string  A  ^  0fc  and  t  random  ( k  —  6)-bit  strings  i?[l] , . . . ,  R[t\.  It  then  splits 
M  into  t  b- bit  blocks  M[l], . . . ,  M[t],  and  outputs 

W)  =  E(A)  II  E(A  ©  (M[l|  II  R[  1]))  II  •  •  •  II  E(A  ©  [M[t]  ||  i?[t]))  , 

where  ©  is  multiplication  of  fc-bit  strings  interpreted  as  elements  of  the  extension 
Held  GF(2fe). 

Related  WORK.  This  paper  was  formed  by  merging  [5,4]  which  together  func¬ 
tion  as  the  full  version  of  this  paper.  We  refer  there  for  all  proofs  omitted  from 
this  paper  and  also  for  full  and  comprehensive  surveys  of  the  large  body  of  work 
related  to  wiretap  security,  and  more  broadly,  to  information-theoretically  secure 
communication  in  a  noisy  setup.  Here  we  discuss  the  most  related  work. 

Relations  between  entropy-  and  distance-based  security  metrics  have  been  ex¬ 
plored  in  settings  other  than  the  wiretap  one,  using  techniques  similar  to  ours  [13, 
7, 15],  the  last  in  the  context  of  statistically-private  committment.  Iwamoto  and 
Ohta  [25]  relate  different  notions  of  indistinguishability  for  statistically  secure 
symmetric  encryption.  In  the  context  of  key-agreement  in  the  wiretap  setting 
(a  simpler  problem  than  ours)  Csiszar  [13]  relates  MIS-R  and  RDS,  the  latter 
being  DS  for  random  messages. 

Wyner’s  syndrome  coding  approach  [41]  and  extensions  by  Cohen  and  Zemor 
[9, 10]  only  provide  weak  security.  Hayashi  and  Matsumoto  [21]  replace  the 
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matrix-multiplication  in  these  schemes  by  a  universal  hash  function  and  show 
MIS-R  security  of  their  scheme.  Their  result  could  be  used  to  obtain  an  alterna¬ 
tive  to  the  proof  of  RDS  security  used  in  our  scheme  for  the  common  case  where 
the  extractor  is  obtained  from  a  universal  hash  function.  However,  the  obtained 
security  bound  is  not  explicit,  making  their  result  unsuitable  to  applications. 
Moreover,  our  proof  also  yields  as  a  special  case  a  cleaner  proof  for  the  result 
of  [21], 

Syndrome  coding  could  be  viewed  as  a  special  case  of  coset  coding  which 
is  based  on  an  inner  code  and  outer  code.  Instantiations  of  this  approach  have 
been  considered  in  [40, 33, 38]  using  LDPC  codes,  but  polynomial-time  decoding 
is  possible  only  if  the  adversary  channel  is  a  binary  erasure  channel  or  the  receiver 
channel  is  noiseless. 

Mahdavifar  and  Vardy  [29]  and  Hof  and  Shamai  [23]  (similar  ideas  also  ap¬ 
peared  in  [26,1])  use  polar  codes  [2]  to  build  encryption  schemes  for  the  wire¬ 
tap  setting  with  binary-input  symmetric  channels.  However,  these  schemes  only 
provide  weak  security.  The  full  version  [30]  of  [29]  provides  a  variant  of  the 
scheme  achieving  MIS-R  security.  They  use  results  of  the  present  paper  (namely 
our  above-mentioned  result  that  MIS-R  implies  MIS  for  certain  schemes,  whose 
proof  they  re-produce  for  completeness),  to  conclude  that  their  scheme  is  also 
MIS-secure.  However  there  is  no  proof  that  decryption  (decoding)  is  possible  in 
their  scheme,  even  in  principle  let  alone  in  polynomial  time.  Also  efficient  gen¬ 
eration  of  polar  codes  is  an  open  research  direction  with  first  results  only  now 
appearing  [39] ,  and  hence  relying  on  this  specific  code  family  may  be  somewhat 
problematic.  Our  solution,  in  contrast,  works  for  arbitrary  codes. 

As  explained  in  [5],  fuzzy  extractors  [17]  can  be  used  to  build  a  DS-secure 
scheme  with  polynomial-time  encoding  and  decoding,  but  the  rate  of  this  scheme 
is  far  from  optimal  and  the  approach  is  inherently  limited  to  low-rate  schemes. 
We  note  that  (seedless)  invertible  extractors  were  previously  used  in  [8]  within 
schemes  for  the  “wiretap  channel  II”  model  [34],  where  the  adversary  (adap¬ 
tively)  erases  ciphertext  bits.  In  contrast  to  our  work,  only  random-message 
security  was  considered  in  [8]. 

2  Preliminaries 

Basic  notation  and  definitions.  “PT”  stands  for  “polynomial-time.”  If  s  is 
a  binary  string  then  s[i]  denotes  its  i-th  bit  and  |s|  denotes  its  length.  If  S'  is  a 
set  then  |S|  denotes  its  size.  If  a;  is  a  real  number  then  |x|  denotes  its  absolute 
value.  A  function  /  :  {0,  l}m  — >  {0, 1}”  is  linear  if  f  (x  ©  y)  =  /( x)  ©  f(y) 
for  all  x.  y  G  (0,  l}m.  A  probability  distribution  is  a  function  P  that  asso¬ 
ciates  to  each  x  a  probability  P( x)  €  [0,1].  The  support  SUPP(P)  is  the  set 
of  all  x  such  that  P(x)  >  0.  All  probability  distributions  in  this  paper  are 
discrete.  Associate  to  random  variable  X  and  event  E  the  probability  distri¬ 
butions  P x i  P x\e  defined  for  all  x  by  P\{x)  =  Pr[X  =  x]  and  Px\e(x)  = 
Pr  [  X  =  x  |  E],  We  denote  by  lg(-)  the  logarithm  in  base  two,  and  by  ln(-)  the 
natural  logarithm.  We  adopt  standard  conventions  such  as  0  lg  0  =  0  lg  oo  =  0 
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and  Pr[Ei\E2)  =  0  when  Pr^]  =  0.  The  function  h:  [0,1]  — >  [0,1]  is  de¬ 
fined  by  h(x)  =  — a;lgx.  The  (Shannon)  entropy  of  probability  distribution  P 
is  defined  by  H(P)  =  h(P(x))  and  the  statistical  difference  between  prob¬ 

ability  distributions  P,  Q  is  defined  by  SD(P;Q)  =  0.5  •  X\P(X)  —  Q{x)\- 
If  X,Y  are  random  variables  the  (Shannon)  entropy  is  defined  by  H(X)  = 
H(Px)  =  J2X  h{P\(x)).  The  conditional  entropy  is  defined  via  H(X  |  Y  =  y)  = 
J2xh(px \Y=y(x))  and  H(X|  Y)  =  Y^ypy(  y)  •  H(X|  Y  =  y).  The  mutual  infor¬ 
mation  is  defined  by  I(X;Y)  =  H(X)  —  H(X|  Y).  The  statistical  or  variational 
distance  between  random  variables  Xi,X2  is  SD(Xi;X2)  =  SD(Px!;Px2)  = 
0.5-£JPr[Xi  =  x\  —  Pr[X2  =  x]|.  The  min-entropy  of  random  variable  X  is 
Hoo(X)  =  —  Igfmaxj:  Pr[X  =  x])  and  if  Z  is  also  a  random  variable  the  conditional 
min-entropy  is  H00(X|Z)  =  —  lg(]P,  Pr[Z  =  z\  maxj  Pr[X  =  ir|Z  =  z\). 

Transforms,  channels  and  algorithms.  We  say  that  T  is  a  transform  with 
domain  D  and  range  R,  written  T:  D  — >  R,  if  T(x)  is  a  random  variable  over  R 
for  every  x  €  D.  Thus,  T  is  fully  specified  by  a  sequence  P  =  {Px}xeD  of  proba¬ 
bility  distributions  over  P,  where  Px{y)  =  Pr[T(x)  =  y]  for  all  x  £  D  and  y  £  R. 
We  call  P  the  distribution  associated  to  T.  This  distribution  can  be  specified 
by  a  \D\  by  |P|  transition  probability  matrix  IT  defined  by  W[x,y ]  =  Px{y).  A 
channel  is  simply  a  transform.  This  is  a  very  general  notion  of  a  channel  but  it 
does  mean  channels  are  memoryless  in  the  sense  that  two  successive  transmis¬ 
sions  over  the  same  channel  are  independent  random  variables.  The  transition 
probability  matrix  representation  is  the  most  common  one  in  this  case.  A  (ran¬ 
domized)  algorithm  is  also  a  transform.  Finally,  an  adversary  too  is  a  transform, 
and  so  is  a  simulator. 

If  T:  {0, 1}  — >  R  is  a  transform  we  may  apply  it  to  inputs  of  any  length.  The 
understanding  is  that  T  is  applied  independently  to  each  bit  of  the  input.  For 
example  BSCp,  classically  defined  as  a  1-bit  channel,  is  here  viewed  as  taking 
inputs  of  arbitrary  length  and  flipping  each  bit  independently  with  probability 
p.  Similarly,  we  apply  a  transform  T :  (0, 1};  — >  R  to  any  input  whose  length 
is  divisible  by  l.  We  say  that  a  transform  T:  D  — »  R  with  transition  matrix  W 
is  symmetric  if  the  there  exists  a  partition  of  the  range  as  R  =  Pi  U  •  •  •  U  Rn 
such  that  for  all  i  the  sub-matix  Wt  =  W[-,  Rt]  induced  by  the  columns  in  It,  is 
strongly  symmetric,  i.e. ,  all  rows  of  Wt  are  permutations  of  each  other,  and  all 
columns  of  IT):  are  permutations  of  each  other. 

3  Security  metrics  and  relations 

Encryption  functions  and  schemes.  An  encryption  function  is  a  transform 
£:  (0,  l}m  —>  {0, 1}C  where  m  is  the  message  length  and  c  is  the  sender  ci¬ 
phertext  length.  The  rate  of  £  is  Rate(£)  =  m/c.  If  ChR:  {0, 1}C  — >  (0,  l}d 
is  a  receiver  channel  then  a  decryption  function  for  £  over  ChR  is  a  transform 
D\  (0,  l}d  — >  {0,  l}m  whose  decryption  error  DE(£;P;ChR)  is  defined  as  the 
maximum,  over  all  M  £  {0,  l}m,  of  Pr[D(ChR(£(M)))  M\. 

An  encryption  scheme  £  =  is  a  family  of  encryption  functions 

where  £k ■  {0,  — >•  {0,  l}cIfc)  for  functions  m,c:  N  — >  N  called  the  mes- 
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sage  length  and  sender  ciphertext  length  of  the  scheme.  The  rate  of  £  is  the 
function  Rate^:  N  — >  R.  defined  by  Ratej(fc)  =  Rate(ffc)  for  all  k  £  N.  Sup¬ 
pose  ChR  =  {ChRfc}fcejij  is  a  family  of  receiver  channels  where  ChR^:  {0,  l}c(fe)  — >• 
{0,  l}dffe).  Then  a  decryption  scheme  for  £  over  ChR  is  a  family  V  =  {X>fc}fcgN 
where  V {0,  — >•  {0,  l}m  is  a  decryption  function  for  over  ChR^,.  We 

say  that  V  is  a  correct  decryption  scheme  for  £  relative  to  ChR  if  the  limit  as 
k  — >  oo  of  DE (£k\Vk]  ChRfc)  is  0.  We  say  that  encryption  scheme  £  is  decrypt- 
able  relative  to  ChR  if  there  exists  a  correct  decryption  scheme  for  £  relative  to 
ChR.  A  family  {iSfe}fcSN  (eg.  an  encryption  or  decryption  scheme)  is  PT  if  there 
is  a  polynomial  time  computable  function  which  on  input  lk  (the  unary  repre¬ 
sentation  of  k)  and  x  returns  Sk(x).  Our  constructs  will  provide  PT  encryption 
and  decryption  schemes. 

Security  METRICS.  Let  £:  {0,  l}m  — »  {0, 1}C  be  an  encryption  function  and  let 
ChA:  {0, 1}C  — >•  (0,  l}d  be  an  adversary  channel.  Security  depends  only  on  these, 
not  on  the  receiver  channel.  We  now  define  semantic  security  (ss),  distinguishing 
security  (ds),  random  mutual-information  security  (rnis-r)  and  mutual  informa¬ 
tion  security  (mis).  For  each  type  of  security  xs  E  {ss,  ds,  mis-r,  mis},  we  associate 
to  £;  ChA  a  real  number  Advxs(£ ;  ChA).  The  smaller  this  number,  the  more  se¬ 
cure  is  £ ;  ChA  according  to  the  metric  in  question.  The  security  of  an  encryption 
function  is  quantitative,  as  captured  by  the  advantage.  We  might  measure  it 
in  bits,  saying  that  £ ;  ChA  has  s  bits  of  xs-security  if  Advxs(£ ;  ChA)  <  2~s.  A 
qualitative  definition  of  “secure,”  meaning  one  under  which  something  is  secure 
or  not  secure,  may  only  be  made  asymptotically,  meaning  for  schemes.  We  say 
encryption  scheme  £  =  {£k}ken  is  XS-secure  relative  to  ChA  =  {ChAfe};^  if 
linifc-^oo  Advxs(£ife;  ChAfc)  =  0.  This  does  not  mandate  any  particular  rate  at 
which  the  advantage  should  vanish,  but  in  our  constructions  this  rate  is  expo¬ 
nentially  vanishing  with  k.  We  define  the  ss  advantage  Advss(£;  ChA)  as 

max  ^maxPr[A(ChA(£(M)))  =  /(M)]  -  maxPr[5(m)  = /(M)]^  .  (1) 

Here  /  is  a  transform  with  domain  {0,  l}m  that  represents  partial  information 
about  the  message.  Examples  include  f(M )  =  M  or  f(M)  =  M[  1]  or  f(M )  = 
M[l]©  •  •  •  ©Af  [to],  where  M[i]  is  the  i-th  bit  of  M.  But  /  could  be  a  much  more 
complex  function,  and  could  even  be  randomized.  The  adversary’s  goal  is  to 
compute  /( M)  given  an  adversary  ciphertext  ChA(£(M))  formed  by  encrypting 
message  M.  The  probability  that  it  does  this  is  Pr[A(ChA(£(M)))  =  /(M)], 
then  maximized  over  all  adversaries  A  to  achieve  strategy  independence.  We 
then  subtract  the  a  priori  probability  of  success,  meaning  the  maximum  possible 
probability  of  computing  /(M)  if  you  are  not  given  the  adversary  ciphertext. 
Finally,  the  outer  max  over  all  /,  M  ensures  that  the  metric  measures  the  extent 
to  which  any  partial  information  leaks  regardless  of  message  distribution.  We 
define  the  distinguishing  advantage  via 

Advds(£;  ChA)  =  max  2  Pr [A(M0,  Mi,  ChA (£(Mb)))  =  b]  -  1  (2) 

A, Mo, Mi 

=  max  SD(ChA(£(M0));ChA(£(Mi)))  .  (3) 


13.  Semantic  Security  for  the  Wiretap  Channel 


10 


Mihir  Bellare,  Stefano  Tessaro,  and  Alexander  Vardy 


In  Eq.  (2),  Pr[M(M0,  Mi,  ChA(£(Mt,)))  =  b]  is  the  probability  that  adversary  A, 
given  m-bit  messages  Mo,  M i  and  an  adversary  ciphertext  emanating  from  Mb, 
correctly  identifies  the  random  challenge  bit  b.  The  a  priori  success  probability 
being  1/2,  the  advantage  is  appropriately  scaled.  This  advantage  is  equal  to  the 
statistical  distance  between  the  random  variables  ChA(£ (Mo))  and  ChA(£(Mi)). 
The  mutual-information  security  advantages  are  defined  via 

Advmir~r(£;  ChA)  =  I(U;  ChA(£(U)))  (4) 

Advmis(£;  ChA)  =  max I(M:  ChA(£(M)))  (5) 

M 

where  the  random  variable  U  is  uniformly  distributed  over  {0,  l}m. 

DS  IS  equivalent  TO  SS.  Theorem  1  below  says  that  SS  and  DS  are  equivalent 
up  to  a  small  constant  factor  in  the  advantage.  This  is  helpful  because  DS  is  more 
analytically  tractable  than  SS.  The  proof  is  an  extension  of  the  classical  ones  in 
computational  cryptography  and  is  given  in  [5]. 

Theorem  1.  [DS  t-A  SS]  Let  £:  {0,  l}m  — >  {0, 1}C  be  an  encryption  function 
and  ChA  an  adversary  channel.  Then  Advss(£;ChA)  <  Advds(£;  ChA)  <  2- 
Advss(£;  ChA).  | 

MIS  implies  DS.  The  KL  divergence  is  a  distance  measure  for  probability  dis¬ 
tributions  P,  Q  defined  by  D (P;Q)  =  J2X  P(x)  lg  P{x)/Q{x)-  Let  M,C  be  ran¬ 
dom  variables.  Probability  distributions  J m,c,  I m,c  are  defined  for  all  M,  C  by 
Jm,c(M, C)  =  Pr  [ M  =  M,C  =  C]  and  Jm,c(M, C)  =  Pr  [  M  =  M]  •  Pr  [ C  =  C\. 
Thus  J m.c  is  the  joint  distribution  of  M  and  C,  while  7m, c  is  the  “independent” 
or  product  distribution.  The  following  is  standard: 

Lemma  2.  Let  M.C  be  random  variables.  Then  I(M;C)  =  D(  Jm.c;  7m, c)-  I 

Pinsker’s  inequality  -  from  [35]  with  the  tight  constant  from  [12]  lower  bounds 
the  KL  divergence  between  two  distributions  in  terms  of  their  statistical  distance: 

Lemma  3.  LetP,Q  be  probability  distributions .  ThenT)(P;  Q)  >  2-SD(P;Q)2.  | 

To  use  the  above  we  need  the  following,  whose  proof  is  in  [5] : 

Lemma  4.  Let  M  be  uniformly  distributed  over  {Mo,  Mi}  C  {0,l}m.  Let  g : 
{0,  l}m  — >  {0, 1}C  be  a  transform  and  let  C  =  <?(M).  Then  SD(Jm  c;  Im  c)  equals 
SD(5(Mo);5(M1))/2.  | 

Combining  the  lemmas,  we  show  the  following  in  [5]: 

Theorem  5.  [MIS  DS]  Let  £:  {0,  l}m  — >  {0, 1}C  be  an  encryption  function 
and  ChA  an  adversary  channel.  Then  Advds(£ ;  ChA)  <  -^2  ■  Advmis(£ ;  ChA).  | 

DS  implies  MIS.  The  following  general  lemma  from  [5]  bounds  the  difference 
in  entropy  between  two  distributions  in  terms  of  their  statistical  distance.  It  is  a 
slight  strengthening  of  [11,  Theorem  16.3.2].  Similar  bounds  are  provided  in  [22]. 
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Lemma  6.  Let  P,  Q  be  probability  distributions.  Let  N  =  |supp(P)  U  SUPP(Q)| 
and  e  =  SD(P;  Q).  Then  H(P)  -  H(Q)  <  2e  •  lg (N/e).  I 

To  exploit  this,  we  define  the  pairwise  statistical  distance  PSD(M;C)  between 
random  variables  M,  C  as  the  maximum,  over  all  messages  Mo,  Mi  £  SUPP(Pm), 
of  SD(PC|M=Mo;  PC|M=Ml).  The  proof  of  the  following  is  in  [5]. 

Lemma  7.  Let  M,  C  be  random  variables.  Then  SD(Pc;  Pc|m=m)  <  PSD(M;  C) 
for  any  M .  | 

From  this  we  conclude  in  [5]  that  DS  implies  MIS: 

Theorem  8.  [DS  MIS]  Let  £:  {0,  l}m  — >  {0, 1}C  be  an  encryption  function 
and  ChA  an  adversary  channel.  Lete  =  Advds(£;  ChA).  Then  Advmls(£;  ChA)  < 
2e  •  lg(2c/ e).  I 

The  somewhat  strange-looking  form  of  the  bound  of  Theorem  8  naturally  raises 
the  question  of  whether  Lemma  6  is  tight.  The  following  says  that  it  is  up  to  a 
constant  factor  of  4.  The  proof  is  in  [5] . 

Proposition  9.  Let  n  >  k  >  1  be  integers.  Let  e  =  2~k  and  N  =  1  +  e2n.  Then 
there  are  distributions  P,Q  with  |supp(P)  U  SUPP(Q)|  =  N  and  SD(P;Q)  =  e 
and  H(P)  -  H(Q)  >  0.5  •  e  •  lg(A/e).  I 

Other  relations.  We  have  now  justified  all  the  numbered  implication  arrows 
in  Fig.  2.  The  un-numbered  implication  MIS  -A  MIS-R  is  trivial.  The  intuition 
for  the  separation  MIS-R  A  MIS  is  simple.  Let  £  be  the  identity  function.  Let 
ChA  faithfully  transmit  inputs  0m  and  lm  and  be  very  noisy  on  other  inputs. 
Then  MIS  fails  because  the  adversary  has  high  advantage  when  the  message 
takes  on  only  values  0m,  lm  but  MIS-R-security  holds  since  these  messages  are 
unlikely.  This  example  may  seem  artificial.  In  [5]  we  give  a  more  complex  example 
where  ChA  is  a  BSC  and  the  encryption  function  is  no  longer  trivial. 

4  DS-Secure  Encryption  Achieving  Secrecy  Capacity 

This  section  presents  our  main  technical  result,  an  encryption  scheme  achieving 
DS-security.  Its  rate,  for  a  large  set  of  adversary  channels,  is  optimal. 

High-level  approach.  We  start  by  considering  an  extension  of  the  usual  set¬ 
ting  where  sender  and  receiver  share  a  public  random  value  S,  i.e.,  known  to  the 
adversary,  and  which  we  call  the  seed.  We  will  call  an  encryption  function  in  this 
setting  a  seeded  encryption  function.  For  simplicity,  this  discussion  will  focus  on 
the  case  where  ChR  and  ChA  are  BSCs  with  respective  crossover  probabilities 
Pr  <  Pa  <  1/2,  and  we  assume  that  sender  and  receiver  only  want  to  agree  on 
a  joint  secret  key.  If  we  let  S  be  the  seed  of  an  extractor  Ext:  Sds  x  {0,  l}fc  — » 
{0,  l}m  and  given  an  error-correcting  code  E:  (0,  l}fc  — >-{0,1}"  for  reliable  com¬ 
munication  over  BSCPR,  a  natural  approach  consists  of  the  sender  sending  E  (f?), 
for  a  random  fc-bit  R ,  to  the  receiver,  and  both  parties  now  derive  the  key  as 
I\  =  Ext(S',  I?),  since  the  receiver  can  recover  R  with  very  high  probability. 
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The  achievable  key  length  is  at  most  Ho 0(R\Z),  where  Z  =  BSCPA(E(i?)). 
Yet,  it  is  not  hard  to  see  that  the  most  likely  outcome,  when  Z  =  z,  is  that 
R  equals  the  unique  r  such  that  E(r)  =  2,  and  that  hence  H00(i?|Z)  =  n  • 
lg  (1/(1  —  Pa)),  falling  short  of  achieving  capacity  H(pa)  —  h(pn).  To  overcome 
this,  we  will  observe  the  following:  We  can  think  of  BSCPA  as  adding  an  n-bit 
vector  E  to  its  input  E (R),  where  each  bit  E\i\  of  the  noise  vector  takes  value  one 
with  probability  pa-  With  overwhelming  probability,  E  is  (roughly)  uniformly 
distributed  on  the  set  of  n-bit  vectors  with  hamming  weight  (approximately) 
Pa  -n  and  there  are  (approximately)  2n'fl2<'PA )  such  vectors.  Therefore,  choosing 
the  noise  uniformly  from  such  vectors  does  not  change  the  experiment  much, 
and  moreover,  in  this  new  experiment,  one  can  show  that  roughly  Hoc(i?|Z)  > 
k  —  n  ■  (1  —  /12 (pa)),  which  yields  optimal  rate  using  an  optimal  code  with  k  « 
(1  —  h(pu))  -n.  We  will  make  this  precise  for  a  general  class  of  symmetric  channels 
via  the  notion  of  smooth  min-entropy  [36]. 

But  recall  that  our  goal  is  way  more  ambitious:  Alice  wants  to  send  an 
arbitrary  message  of  her  choice.  The  obvious  approach  is  obtain  a  key  I\  as  above 
and  then  send  an  error-corrected  version  of  K®M.  But  this  at  least  halves  the 
rate,  which  becomes  far  from  optimal.  Our  approach  instead  is  to  use  an  extractor 
Ext  that  is  invertible ,  in  the  sense  that  given  M  and  S,  we  can  sample  a  random 
R  such  that  Ext)#,  R)  =  M.  We  then  encrypt  a  message  M  as  E(f?),  where  R  is  a 
random  preimage  of  M  under  Ext),!?,  •).  However,  the  above  argument  only  yields, 
at  best,  security  for  randomly  chosen  messages.  In  contrast,  showing  DS-security 
accounts  to  proving,  for  any  two  messages  M0  and  Mi,  that  BSCPA(E (Ro))  and 
BSCpa(E(I?i))  are  statistically  close,  where  Ri  is  uniform  such  that  Ext (5,  Ri)  = 
Mi.  To  make  things  even  worse,  we  allow  the  messages  Mo  and  Mi  are  allowed 
to  depend  on  the  seed.  The  main  challenge  is  that  such  proof  appears  to  require 
detailed  knowledge  of  the  combinatorial  structure  of  E  and  Ext,  as  the  actual 
ciphertext  distribution  depends  on  them. 

Instead,  we  will  take  a  completely  different  approach:  We  show  that  any 
seeded  encryption  function  with  appropriate  linearity  properties  is  DS-secure 
whenever  it  is  secure  for  random  messages.  This  result  is  surprising,  as  random- 
message  security  does  not,  in  general,  imply  chosen-message  security.  A  careful 
choice  of  the  extractor  to  satisfy  these  requirements,  combined  with  the  above 
idea,  yields  a  DS-secure  seeded  encryption  function.  The  final  step  is  to  remove 
the  seed,  which  is  done  by  transmitting  it  (error-corrected)  and  amortizing  out 
its  impact  on  the  rate  to  essentially  zero  by  re-using  it  with  the  above  seeded 
encryption  function  across  blocks  of  the  message.  A  hybrid  argument  is  used  to 
bound  the  decoding  error  and  loss  in  security. 

Seeded  encryption.  A  seeded  encryption  function  S£:  Sdsx{0,  l}b  — >  (0, 1}" 
takes  a  seed  S  €  Sds  and  message  M  €  {0,  l}b  to  return  a  sender  cipher- 
text  denoted  S£(S,M)  or  S£s{M);  each  seed  S  defines  an  encryption  function 
S£ s'  {0,  l}b  — >•  {0, 1}".  There  must  be  a  corresponding  seeded  decryption  func¬ 
tion  SV :  Sds  x  {0, 1}"  — >  {0,  l}6  such  that  ST>(S,  S£(S,  M))  =  M  for  all  S,  M. 
We  consider  an  extension  of  the  standard  wiretap  setting  where  a  seed  S  <— *  Sds 
is  a  public  parameter,  available  to  sender,  receiver  and  adversary.  We  extend  DS- 
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transform  S£(S,  M): 

transform  V(C):  //  C  £  OuTRlc+t^ 

//  S  £  Sds,  M  £  {0,  l}6 

R£- *{0,  l}r;  Ret  E(lnv(S,  R,  M))  . 

transform  £(M):  //  M  £  (0,  l}m 

C[l],..,C[c  +  i]eC 

S  ■£-  D(C[1])  ||  ...  ||  D(C[c]) 

For  *  =  1  to  t  do 

X[i]  <-  D(C[c  +  *]) 

M[i]  £-  Ext  (S',  A'[i]) 

Ret  M[  1]  ||  •  •  •  ||  M[t]  . 

S  <— s  Sds  ;  £[1], . . . ,  S[c]  £-  S 

M[l], . . . ,  M[t]  £-  M 

For  i  =  1  to  t  do  C\i\  ■£- *  S£(S,  M[i\) 

Ret  E(5[f])  ||  ■■■  ||E(5[c])||C[l]  11  •  •  •  II  C\t\  . 

Fig.  3.  Encryption  function  £  =  RItEt[lnv,E]  using  S£  =  ItE[lnv,  E]  and  de¬ 
cryption  function  T>.  By  A'[l], . . .  ,  X[c]  £-  X  we  mean  that  6c-bit  string  X  is  split 
into  6-bit  blocks. 


security  to  this  setting  by  letting  Advds(<S£ ;  ChA)  be  the  expectation,  over  S 
drawn  at  random  from  Sds,  of  Advds(<S£s;  ChA).  The  rate  of  S£  is  defined  as 
b/n ,  meaning  the  seed  is  ignored. 

Extractors.  A  function  Ext:  Sds  x  {0,  l}fc  — >  {0,  l}6  is  an  (h,  a) -extractor  if 
SD((Ext(S,  X),  Z,  S);  (U,  Z,  S))  <  a  for  all  pairs  of  (correlated)  random  variables 
(X,  Z)  over  {0,  l}fc  x  {0, 1}*  with  H00(X|Z)  >  h,  where  additionally  S  and  U  are 
uniform  on  Sds  and  (0,  l}b,  respectively.  (This  is  a  strong,  average  case  extractor 
in  the  terminology  of  [16].)  We  will  say  that  Ext  is  regular  if  for  all  S  £  Sds, 
the  function  Ext(5,  •)  is  regular,  meaning  every  point  in  the  range  has  the  same 
number  of  preimages. 

Inverting  extractors.  We  say  that  a  function  Inv  :  Sds  x  {0,  l}r  x  {0,  l}b  -» 
(0,  l}fc  is  an  inverter  for  an  extractor  Ext  :  Sds  x  {0,  l}fe  — >•  {0,  l}b  if  for  all 
S  £  Sds  and  Y  £  {0,  l}b,  and  for  R  uniform  over  {0, 1  }fc,  the  random  variable 
lnv(S,  R,Y)  is  uniformly  distributed  on  {  X  £  {0,  l}fc  :  Ext (5,  A)  =  Y  },  the 
set  of  preimages  of  Y  under  Ext  (S',  •).  To  make  this  concrete  we  give  an  example 
of  an  extractor  with  an  efficiently  computable  inverter.  Recall  that  fc-bit  strings 
can  be  interpreted  as  elements  of  the  finite  field  GF(2fc),  allowing  us  to  define 
a  multiplication  operator  ©  on  fc-bit  strings.  Then,  for  Sds  =  (0,  l}fe  \  0fc,  we 
consider  the  function  Ext  :  Sds  x  {0,  l}fc  — >  (0,  l}b  which,  on  inputs  S  £  Sds 
and  X  £  {0,  l}fe,  outputs  the  first  b  bits  of  X  ©  S.  It  is  easy  to  see  that  Ext  is 
regular,  as  0fc  is  not  in  the  set  of  seeds.  In  [4]  we  prove  the  following  using  the 
average-case  version  of  the  Leftover  Hash  Lemma  of  [20],  due  to  [16]. 

Lemma  10.  For  all  a  £  (0, 1]  and  all  b  <  k  —  2  lg(l/a)  +  2  the  function  Ext  is 
a  (b  +  2  lg(l/a)  —  2,  a) -extractor. 

An  efficient  inverter  Inv  :  Sds  x  (0,  l}fe_b  x  {0,  l}b  —1  (0,  l}fe  is  obtained  via 
In v(S,R,M)  =  S~x  ©  (M  ||  R)  where  S’-1  is  the  inverse  of  S  with  respect  to 
multiplication  in  GF(2fe). 
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The  RItE  construction.  Let  Ext  :  Sds  x  {0,  l}fc  ->•  {0,  l}b  be  a  regular 
extractor  with  inverter  Inv  :  Sds  x  {0,  l}r  x  {0,  l}b  — >  {0,  l}fc.  Also  let  E  : 
{0,  l}fc  —>  {0, 1}"  be  an  injective  function  with  k  <  n,  later  to  be  instantiated 
via  an  ECC.  Assume  without  loss  of  generality  that  for  some  c  >  1,  we  have 
|Sj  =  c  ■  k  for  all  S  £  Sds.  The  encryption  function  £  is  described  in  Fig.  3 
and  is  obtained  via  the  construction  RItEt  (Repeat  Invert- then-Encode),  where 
t  >  1  is  a  parameter:  As  its  main  component,  it  relies  on  the  construction 
ItE  (Invert-then-Encode)  of  a  seeded  encryption  function  ItE[lnv,  E]  :  Sds  x 
{0,  l}b  —>  {0,  l}n  which  applies  the  inverter  Inv  to  the  message,  and  then  applies 
E  to  the  result.  The  final,  seed-free,  encryption  function  RItEt[lnv,  E]  then  takes 
an  input  M  £  {0,  l}m,  where  m  =  t-b,  splits  it  into  t  6-bit  blocks  M\ 1], . . .  M[t], 
chooses  a  random  seed  S,  and  combines  an  encoding  of  S  with  the  encryptions 
of  the  blocks  using  S£s  for  S£  =  ItE[lnv,  E], 

Decryptability.  Given  a  channel  ChR  :  {0,1}  — >  OutR,  a  decoder  for  E 
over  ChR  is  a  function  D  :  OutR”  — >•  {0,  l}fc.  Its  decoding  error  is  defined 
as  DE(E;  D;ChR)  =  maxMg{01j k  Pr  [  D(ChR(E(M)))  ^  M ].  Therefore,  for  any 
output  alphabet  OutR  and  function  D  :  OutR"  — >  {0,  l}b,  we  define  the  corre¬ 
sponding  decryption  function  for  £  over  ChR  as  in  Fig.  3.  The  following  lemma 
summarizes  the  relation  between  its  decryption  error  and  the  one  of  D. 

Lemma  11.  [Correct  decryption]  Let  ChR  :  {0,1}  — >  OutR  be  a  chan¬ 
nel,  and  let  £,  V,  E.  and  D  be  as  above.  Then,  DE(£;X>;ChR)  <  (c  +  t)  ■ 

DE(E;  D;  ChR).  | 

Step  I:  From  RItE  to  ItE.  We  reduce  security  of  RItE  to  that  of  ItE.  The 
proof  of  the  following  [4]  uses  a  hybrid  argument. 

Lemma  12.  Let  t  >  1,  £  =  RItE* [Inv,  E]  and  S£  =  ItE[lnv,  E].  For  all  ChA  : 
{0, 1}"  ->  OutA  we  have  Advds(£ ;  ChA)  <  t  ■  Advds(<S£;  ChA).  | 

Step  II:  RDS-SECURITY  of  ItE.  Towards  determining  the  DS-security  of  ItE 
we  first  address  the  seemingly  simpler  question  of  proving  security  under  random 
messages.  Specifically,  for  a  seeded  encryption  function  S£  :  Sds  x  {0,  l}b  — > 
{0,1}”,  we  define  the  rds  advantage  Advrds(<S£;  ChA)  as  the  expectation  of 
SD((ChA(5£ (5,  U)),  U);  (ChA(<S£(S,  U')),  U))  where  U  and  U'  are  uniformly  cho¬ 
sen  and  independent  6-bit  messages,  and  the  expectation  is  taken  over  the  choice 
of  the  seed  S.  Exploiting  the  notion  of  e-smooth  min-entropy  [36],  the  following, 
proven  in  [4],  establishes  RDS-security  of  ItE: 

Lemma  13.  [RDS-security  of  ItE]  Let  5  >  0,  let  ChA  :  {0, 1}  — >  OutA  be  a 
symmetric  channel,  let  Inv  :  Sds  x  {0,  l}r  x  {0,  l}b  —>  {0,  l}fe  be  the  inverter  of 
a  regular  ( k  —  n  ■  (lg(|OUTA|)  —  H(ChA)  +  6),  a)-extractor,  and  let  E  :  {0,  l}fe  — > 
{0, 1}”  be  injective.  Then,  for  S£  =  ItE[lnv,  E],  we  have 

Advrds(<S£;  ChA)  <2-2  21g^(|0™A|+3)  q,  _  | 

Step  III:  From  RDS-  TO  DS-security.  In  contrast  to  RDS-security,  proving 
DS-security  of  ItE  seems  to  require  a  better  grasp  of  the  combinatorial  structure 
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of  E  and  Inv.  More  concretely,  think  of  any  randomized  (seeded)  encryption 
function  S£  :  Sds  x  {0,  l}b  — >  {0, 1}"  as  a  deterministic  map  S£ :  Sds  x  {0,  l}r  x 
(0,  l}b  — »•  {0, 1}"  (for  some  r),  where  the  second  argument  takes  the  role  of  the 
random  coins.  We  call  S£  separable  if  S£(S,  R,  M)  =  S£(S ,  R,  06)©«S£(S,  0r,  M) 
for  all  S  €  Sds,  R  G  (0,  l}r,  and  M  G  (0,  l}b.  Also,  it  is  message  linear  if 
S£ (S',  0r,  •)  is  linear  for  all  S  G  Sds.  The  following  is  true  for  encryption  functions 
with  both  these  properties,  and  is  proven  in  [4]. 

Lemma  14.  [RDS  =£•  DS]  Let  ChA  :  {0,1}  — >  OutA  be  symmetric.  If  S£  is 
separable  and  message  linear,  then  Advds(S£;  ChA)  <  2  •  Advrds(S£;  ChA).  | 

Coming  back  to  ItE,  we  say  that  Inv  :  Sds  x  {0,l}r  x  {0,  l}6  — >  {0,  l}fc  is 
output  linear  if  lnv(S,  0r,-)  is  linear  for  all  S  G  Sds.  Moreover,  it  is  separable 
if  lnv(S,  R,  M)  =  lnv(S,  R,  0b)  ©  lnv(S,0r,M)  for  all  S  G  Sds,  R  G  {0,l}r, 
and  M  G  {0,  l}b.  For  example,  the  inverter  for  the  above  extractor  based  on 
finite-field  multiplication  is  easily  seen  to  be  output  linear  and  separable,  by  the 
linearity  of  the  map  M  M-  S-1  ©  M . 

Security.  If  we  instantiate  ItE[lnv,  E]  so  that  Inv  is  both  output  linear  and 
separable,  and  we  let  E  be  linear,  the  encryption  function  S£  is  easily  seen  to  be 
message  linear  and  separable.  The  following  theorem  now  follows  immediately 
by  combining  Lemma  12,  Lemma  14,  and  Lemma  13. 

Theorem  15.  [DS-security  of  RItE]  Let  8  >  0  and  t  >  1.  Also,  let  ChA  : 
(0, 1}  — >  OutA  be  a  symmetric  channel,  let  Inv  :  Sdsx{0,  l}rx{0,  l}b  — >  {0,  l}fe 
be  the  output-linear  and  separable  inverter  of  a  regular  (k  —  n  •  (lg(|OUTA|)  — 
H(ChA)  +  5),  a) -extractor,  and  let  E  :  {0,  l}fe  —>  {0,1}"  be  linear  and  injective. 
Then,  for  £  =  RItEt[lnv,  E],  we  have 

Advds(£;  ChA)  <  2t  ■  ^2  •  2“  2  lg-**  (| OutA | +3)  _|_  _  | 

Instantiating  the  scheme.  Recall  that  if  ChA  :  {0, 1}Z  — >  OutA  and  ChR  : 
{0, 1}*  — >•  OutR  are  symmetric  channels,  their  secrecy  capacity  equals  [27] 
(H(U|ChA(U))  —  H(U|ChR(U)))/Z,  for  a  uniform  Z-bit  U.  Also,  for  a  channel 
ChR,  we  denote  its  (Shannon)  capacity  as  C(ChR)  =  maxx  I(X;  ChR(X))/Z.  We 
will  need  the  following  result  (cf.  e.g.  [18]  for  a  proof). 

Lemma  16.  [18]  For  any  l  G  N  and  any  channel  ChR  :  {0, 1}J  — >  OutR, 
there  is  a  family  E  =  {Es}sSn  of  linear  encoding  functions  Es  :  {0,  l}fc(s)  — ► 
{0, 1}"(S)  (where  n{s)  is  a  midtiple  ofl),  with  corresponding  decoding  functions 
Ds  :  OutR"(s)/'  -»•  {0,  l}fe(s>,  such  that  (i)  DE(ES;  Ds;  ChR)  =  2~e^\  (ii) 
lim^oo  k(s)/n(s)  =  C(ChR),  and  (Hi)  E  and  D  are  PT  computable.  | 

We  now  derive  a  scheme  £  =  {£s}sSn  achieving  secrecy  capacity  for  the  most 
common  case  ChR  =  BSCpr  and  ChA  =  BSCPA,  where  0  <  pr  <  pa  <  We 
start  with  a  family  of  codes  {Es}sSn  for  BSCPR  guaranteed  to  exist  by  Lemma  16, 
where  Es  :  {0,  l}fc(s)  — >•  {0, 1}"(S)  and  lims^oo  k(s)/n(s)  =  1  —  h^lps),  or, 
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equivalently,  there  exists  v  such  that  v(s)  =  o(l)  and  k(s)  =  (1  —  h2(pR)  —  z'(s))  • 
n(s).  Then,  we  let  S(s)  =  (21g2(5))1/2  •  n(s)-1/4  and  a(s)  =  2-n01/2)  and  use 
the  finite-field  based  extractor  Exts  :  {0, 1  }fe(s)  x  {0,  l}fc(s)  — >  {0,  l}b^s\  where 
b{s)  =  k(s)  -  n(s )  •  (1  -  h2(pA)  +  5(s))  +  21g(a)  =  ( h2(pA )  -  h2(pR)  -  v{s)  - 
S(s)  —  2  ■  n(s)-1/2)  •  n(s).  We  note  that  the  resulting  scheme  is  equivalent  to  the 
one  described  in  the  introduction  (with  A  =  S'-1).  With  these  parameters, 

Advds(£s;  BSCPJ  <  6  •  t(s)  •  2-^  ,  DE(£S;  Vs-  BSCPK)  <  (t(s)  +  1)  •  2~e^s» 

by  Theorem  15  and  Lemma  11,  respectively.  The  rate  of  £s  is 

Rate(Ss)  =  •  (h2{pA)  -  h2(pR)  -  u(s)  -  5{s) - jL= 

+  1  ^  Vn(s) 

Setting  t(s)  =  lg (k(s))  yields  lim^oo  Rate(Ss)  =  h2{pA)  -  h2(pR). 

Extensions.  The  proof  applies  also  for  any  pair  of  symmetric  channels  ChR 
and  ChA,  and  the  resulting  rate  is  the  secrecy  capacity  if  the  capacity  of  ChA  : 
{0,1}  —1  OutA  is  lg(|OUTA|)  —  H(ChA),  which  is  the  case  if  and  only  if  a 
uniform  input  to  ChA  produces  a  uniform  output.  For  other  channels,  such  as 
erasure  channels  (where  each  bit  is  left  unchanged  with  probability  S  and  mapped 
to  an  erasure  symbol  with  probability  1  —  5)  our  technique  still  yields  good 
schemes  which,  however,  may  fall  short  of  achieving  capacity.  We  also  remark 
that  the  above  presentation  is  constrained  to  single  input-bit  base  channels  for 
simplicity  only.  Our  results  can  be  extended  to  discrete  memoryless  channels  with 
Z-bit  inputs  for  l  >  1.  For  example,  Lemma  13  extends  to  arbitrary  symmetric 
channels  ChA  :  {0, 1}Z  — >•  OutA,  at  the  price  of  replacing  n  by  n/l  in  the  security 
bound  and  in  the  extractor’s  entropy  requirement.  In  contrast,  we  do  not  know 
whether  Lemma  14  applies  to  arbitrary  symmetric  channels  with  Z-bit  inputs,  but 
it  does,  for  instance,  extend  to  any  channel  of  the  form  ChA(A)  =  X  ©  E,  where 
E  is  an  Z- bit  string  sampled  according  to  an  input-independent  noise  distribution. 
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Abstract.  This  paper  develops  a  theory  of  multi-instance  (mi)  security 
and  applies  it  to  provide  the  first  proof-based  support  for  the  classical 
practice  of  salting  in  password-based  cryptography.  Mi-security  comes 
into  play  in  settings  (like  password-based  cryptography)  where  it  is  com¬ 
putationally  feasible  to  compromise  a  single  instance,  and  provides  a 
second  line  of  defense,  aiming  to  ensure  (in  the  case  of  passwords,  via 
salting)  that  the  effort  to  compromise  all  of  some  large  number  m  of 
instances  grows  linearly  with  m.  The  first  challenge  is  definitions,  where 
we  suggest  LORX-security  as  a  good  metric  for  mi  security  of  encryp¬ 
tion  and  support  this  claim  by  showing  it  implies  other  natural  met¬ 
rics,  illustrating  in  the  process  that  even  lifting  simple  results  from  the 
si  setting  to  the  mi  one  calls  for  new  techniques.  Next  we  provide  a 
composition-based  framework  to  transfer  standard  single-instance  (si) 
security  to  mi-security  with  the  aid  of  a  key-derivation  function.  Ana¬ 
lyzing  password-based  KDFs  from  the  PKCS#5  standard  to  show  that 
they  meet  our  indifferentiability-style  mi-security  definition  for  KDFs, 
we  are  able  to  conclude  with  the  first  proof  that  per  password  salts  am¬ 
plify  mi-security  as  hoped  in  practice.  We  believe  that  mi-security  is  of 
interest  in  other  domains  and  that  this  work  provides  the  foundation  for 
its  further  theoretical  development  and  practical  application. 


1  Introduction 

This  paper  develops  a  theory  of  multi-instance  security  and  applies  it  to  support 
practices  in  password-based  cryptography. 

BACKGROUND.  Password-based  encryption  (PBE)  in  practice  is  based  on  the 
PKCS#5  (equivalently,  RFC  2898)  standard  [32].  It  encrypts  a  message  M  under 
a  password  pw  by  picking  a  random  s-bit  salt  sa ,  deriving  a  key  L  •<—  KD(pw||sa) 
and  returning  C'  <—  C||sa  where  C  <—  $  £(L,  M).  Here  £  is  a  symmetric  encryp¬ 
tion  scheme,  typically  an  IND-CPA  AES  mode  of  operation,  and  key-derivation 
function  (KDF)  KD:  {0,1}*  — >  {0,1}"  is  the  c-fold  iteration  KD  =  Hc  of  a 
cryptographic  hash  function  H:  {0, 1}*  — >  {0, 1}".  However,  passwords  are  of¬ 
ten  poorly  chosen  [29] ,  falling  within  a  set  D  called  a  “dictionary”  that  is  small 
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enough  to  exhaust.  A  brute- force  attack  now  recovers  the  target  password  pw 
(thereby  breaking  the  ind-cpa  security  of  the  encryption)  using  cN  hashes  where 
N  =  \D\  is  the  size  of  the  dictionary.Increasing  c  increases  this  effort,  explaining 
the  role  of  this  iteration  count,  but  c  cannot  be  made  too  large  without  adversely 
impacting  the  performance  of  PBE. 

Consider  now  m  users,  the  ?’-th  with  password  pwi.  If  the  salt  is  absent  (s  = 
0),  the  number  of  hashes  for  the  brute  force  attack  to  recover  all  m  passwords 
remains  around  cN,  but  if  s  is  large  enough  that  salts  are  usually  distinct,  it 
rises  to  mcN,  becoming  prohibitive  for  large  m.  Salting,  thus,  aims  to  make  the 
effort  to  compromise  m  target  passwords  scale  linearly  in  m.  (It  has  no  effect  on 
the  security  of  encryption  under  any  one,  particular  target  password.) 

New  directions.  This  practice,  in  our  view,  opens  a  new  vista  in  theoretical 
cryptography,  namely  to  look  at  the  multi-instance  (mi)  security  of  a  scheme.  We 
would  seek  metrics  of  security  under  which  an  adversary  wins  when  it  breaks  all 
of  m  instances  but  not  if  it  breaks  fewer.  This  means  that  the  mi  security  could 
potentially  be  much  higher  than  the  traditional  single-instance  (si)  security.  We 
would  have  security  amplification. 

Why  do  this?  As  the  above  discussion  of  password-based  cryptography  shows, 
there  are  settings  where  the  computational  effort  t  needed  to  compromise  a  single 
instance  is  feasible.  Rather  than  give  up,  we  provide  a  second  line  of  defense. 
We  limit  the  scale  of  the  damage,  ensuring  (in  the  case  of  passwords,  via  the 
mechanism  of  salting)  that  the  computational  effort  to  compromise  all  of  m 
instances  is  (around)  trn  and  thus  prohibitive  for  large  m.  We  can’t  prevent  the 
occasional  illness,  but  we  can  prevent  an  epidemic. 

We  initiate  the  study  of  multi-instance  security  with  a  foundational  treatment 
in  two  parts.  The  first  part  is  agnostic  to  whether  the  setting  is  password-based 
or  not,  providing  definitions  for  different  kinds  of  mi-security  of  encryption  and 
establishing  relations  between  them,  concluding  with  the  message  that  what  we 
call  LORX-security  is  a  good  choice.  The  second  part  of  our  treatment  focuses 
on  password-based  cryptography,  providing  a  modular  framework  that  proves 
mi-security  of  password-based  primitives  by  viewing  them  as  obtained  by  the 
composition  of  a  mi-secure  KDF  with  a  si-secure  primitive,  and  yielding  in  par¬ 
ticular  the  first  proof  that  salting  works  as  expected  to  increase  multi-instance 
security  under  a  strong  and  formal  metric  for  the  latter. 

Multi-instance  security  turns  out  to  be  challenging  both  definitionally  (pro¬ 
viding  metrics  where  the  adversary  wins  on  breaking  all  instances  but  not  fewer) 
and  technically  (reductions  need  to  preserve  tiny  advantages  and  standard  hy¬ 
brid  arguments  no  longer  work).  It  also  connects  in  interesting  ways  to  security 
amplification  via  direct  products  and  xor  lemmas,  eg.  [37, 16, 19, 30, 13, 27, 34, 
28,35].  (We  import  some  of  their  methods  and  export  some  novel  viewpoints.) 
We  believe  there  are  many  fruitful  directions  for  future  work,  both  theoretical 
(pursuing  the  connection  with  security  amplification)  and  applied  (mi  security 
could  be  valuable  in  public-key  cryptography  where  steadily  improving  attacks 
are  making  current  security  parameters  look  uncomfortably  close  to  the  edge  for 
single- instance  security).  Let  us  now  look  at  all  this  in  some  more  detail. 
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LORX.  We  consider  a  setting  with  to  independent  target  keys  K\, . . .  ,Km. 
(They  may,  but  need  not,  be  passwords.)  In  order  to  show  that  nri-security 
grows  with  m  we  want  a  metric  (definition)  where  the  adversary  wins  if  it  breaks 
all  to  instances  of  the  encryption  but  does  not  win  if  it  breaks  strictly  fewer.  If 
“breaking”  is  interpreted  as  recovery  of  the  key  then  such  a  metric  is  easily  given: 
it  is  the  probability  that  the  adversary  recovers  all  to  target  keys.  We  refer  to 
this  as  the  UKU  (Universal  Key  Unrecoverability)  metric.  But  we  know  very 
well  that  key-recovery  is  a  weak  metric  of  encryption  security.  We  want  instead 
a  mi  analog  of  ind-cpa.  The  first  thing  that  might  come  to  mind  is  multi-user 
security  [3,2].  But  in  the  latter  the  adversary  wins  (gets  an  advantage  of  one) 
even  if  it  breaks  just  one  instance  so  the  nru-advantage  of  an  adversary  can  never 
be  less  than  its  si  (ind-cpa)  advantage.  We,  in  contrast,  cannot  “give  up”  once 
a  single  instance  is  broken.  Something  radically  different  is  needed. 

Our  answer  is  LORX  (left-or- right  xor  indistinguishability).  Our  game  picks 
to  independent  challenge  bits  bi, . . . ,  bm  and  gives  the  adversary  an  oracle  Enc(-, 
-,  •)  which  given  i,  M$,  M\  returns  an  encryption  of  M/)t  under  Ki.  The  adversary 
outputs  a  bit  b'  and  its  advantage  is  2  Pr[U  =  bi  ®  •  •  •  ®  bm]  —  l.4  Why  xor?  Its 
well-known  “sensitivity”  means  that  even  if  the  adversary  figures  out  to  —  1  of 
the  challenge  bits,  it  will  have  low  advantage  unless  it  also  figures  out  the  last. 
This  intuitive  and  historical  support  is  strengthened  by  the  relations,  discussed 
below,  that  show  that  LORX  implies  security  under  other  natural  metrics. 

Relations.  The  novelty  of  multi-instance  security  prompts  us  to  step  back 
and  consider  a  broad  choice  of  definitions.  Besides  UKU  and  LORX,  we  define 
RORX  (real-or-random  xor  indistinguishability,  a  mi-adaptation  of  the  si  ROR 
notion  of  [4])  and  a  natural  AND  metric  where  the  challenge  bits  bi, ...  ,bm  and 
oracle  Enc(-,  •,  •)  are  as  in  the  LORX  game  but  the  adversary  output  is  a  vector 
(b[ , . . . ,  b'm)  and  its  advantage  is  Pr[(6,1, . . . ,  b'm)  =  (bi,...,bm)]  —  2_m.  The 
relations  we  provide,  summarized  in  Figure  1,  show  that  LORX  emerges  as  the 
best  choice  because  it  implies  all  the  others  with  tight  reductions.  Beyond  that, 
they  illustrate  that  the  mi  terrain  differs  from  the  si  one  in  perhaps  surprising 
ways,  both  in  terms  of  relations  and  the  techniques  needed  to  establish  them. 

Thus,  in  the  si  setting,  LOR  and  ROR  are  easily  shown  equivalent  up  to  a 
factor  2  in  the  advantages  [4].  It  continues  to  be  true  that  LORX  easily  implies 
RORX  but  the  hybrid  argument  used  to  prove  that  ROR  implies  LOR  [4]  does 
not  easily  extend  to  the  mi  setting  and  the  proof  that  RORX  implies  LORX 
is  not  only  more  involved  but  incurs  a  factor  2nl  loss.5  In  the  si  setting,  both 

4  This  is  a  simplification  of  our  actual  definition,  which  allows  the  adversary  to  adap¬ 
tively  corrupt  instances  to  reveal  the  underlying  keys  and  challenge  bits.  This  ca¬ 
pability  means  that  LORX-security  implies  threshold  security  where  the  adversary 
wins  if  it  predicts  the  xor  of  the  challenge  bits  of  some  subset  of  the  instances  of  its 
choice.  See  Section  2  for  further  justification  for  this  feature  of  the  model. 

5  This  (exponential)  2m  factor  loss  is  a  natural  consequence  of  the  factor  of  2  loss  in 
the  si  case,  our  bound  is  tight,  and  the  loss  in  applications  is  usually  small  because 
advantages  are  already  exponentially  vanishing  in  to.  Nonetheless  it  is  not  always 
negligible  and  makes  LORX  preferable  to  RORX. 
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RORX 


AND 


UKU 


Fig.  1.  Notions  of  multi-instance  security  for  encryption  and  their  rela¬ 
tions.  LORX  (left-or-right  xor  indistinguishability)  emerges  as  the  strongest,  tightly 
implying  RORX  (real-or-random  xor  indistinguishability)  and  UKU  (universal  key- 
unrecoverability).  The  dashed  line  indicates  that  under  some  (mild,  usually  met)  con¬ 
ditions  LORX  also  implies  AND.  RORX  implies  LORX  and  UKU  but  with  a  2m  loss 
in  advantage  where  m  is  the  number  of  instances,  making  LORX  a  better  choice. 


LOR  and  ROR  are  easily  shown  to  imply  KU  (key  unrecoverability).  Showing 
LORX  implies  UKU  is  more  involved,  needing  a  boosting  argument  to  ensure 
preservation  of  exponentially-vanishing  advantages.  This  reduction  is  tight  but, 
interestingly,  the  reduction  showing  RORX  implies  UKU  is  not,  incurring  a  2m- 
factor  loss,  again  indicating  that  LORX  is  a  better  choice.  We  show  that  LORX 
usually  implies  AND  by  exploiting  a  direct  product  theorem  by  Unger  [35],  evi¬ 
dencing  the  connections  with  this  area.  Another  natural  metric  of  mi-security  is 
a  threshold  one,  but  our  incorporation  of  corruptions  means  that  LORX  implies 
security  under  this  metric. 

Mi-security  of  PBE.  Under  the  LORX  metric,  we  prove  that  the  advantage 
e'  obtained  by  a  time  t  adversary  against  m  instances  of  the  above  PBE  scheme 
£'  is  at  most  e  +  ( q/mcN)m  (we  are  dropping  negligible  terms)  where  q  is  the 
number  of  adversary  queries  to  RO  H  and  e  is  the  advantage  of  a  time  t  ind-cpa 
(si)  adversary  against  £ .  This  is  the  desired  result  saying  that  salting  works  to 
provide  a  second  line  of  defense  under  a  strong  mi  security  metric,  amplifying 
security  linearly  in  the  number  of  instances. 

FRAMEWORK.  This  result  for  PBE  is  established  in  a  modular  (rather  than  ad 
hoc)  way,  via  a  framework  that  yields  corresponding  results  for  any  password- 
based  primitive.  This  means  not  only  ones  like  password-based  message  authen¬ 
tication  (also  covered  in  PKCS#5)  or  password-based  authenticated  encryption 
(WinZip)  but  public-key  primitives  like  password-based  digital  signatures,  where 
the  signing  key  is  derived  from  a  password.  We  view  a  password-based  scheme  for 
a  goal  as  derived  by  composing  a  key-derivation  function  (KDF)  with  a  standard 
(si)  scheme  for  the  same  goal.  The  framework  then  has  the  following  components. 
(1)  We  provide  a  definition  of  mi-security  for  KDFs.  (2)  We  provide  composition 
theorems,  showing  that  composing  a  mi-secure  KDF  with  a  si-secure  scheme  for 
a  goal  results  in  a  nri-secure  scheme  for  that  goal.  (We  will  illustrate  this  for 
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the  case  of  encryption  but  similar  results  may  be  shown  for  other  primitives.) 
(3)  We  analyze  the  iterated  hash  KDF  of  PKCS#5  and  establish  its  mi  security. 

The  statements  above  are  qualitative.  The  quantitative  security  aspect  is 
crucial.  The  definition  of  mi-security  of  KDFs  must  permit  showing  nri-security 
much  higher  than  si-security.  The  reductions  in  the  composition  theorems  must 
preserve  exponentially  vanishing  mi- advantages.  And  the  analysis  of  the  PKCS^5 
KDF  must  prove  that  the  adversary  advantage  in  q  queries  to  the  RO  H  grows 
as  ( q/cm.N)m1  not  merely  q/cN.  These  quantitative  constraints  represent  im¬ 
portant  technical  challenges. 

Mi-security  of  KDFs.  We  expand  on  item  (1)  above.  The  definition  of  mi- 
security  we  provide  for  KDFs  is  a  simulation-based  one  inspired  by  the  indiffer¬ 
entiability  framework  [26,11]-  The  attacker  must  distinguish  between  the  real 
world  and  an  ideal  counterpart.  In  both,  target  passwords  pwi , . . . , pwm  and 
salts  sai, . . . ,  sam  are  randomly  chosen.  In  the  real  world,  the  adversary  gets 
input  (pwi,  sai,  KD(pu>i||sai)), . . . ,  (pwm,  sam,  KD(p«;TO||sai))  and  also  gets  an 
oracle  for  the  RO  hash  function  H  used  by  KD.  In  the  ideal  world,  the  input 
is  {pwi,  sai,  L i), . . . ,  ( pwm ,  sam ,  Lm)  where  the  keys  L\, . . . ,  Lm  are  randomly 
chosen,  and  the  oracle  is  a  simulator.  The  simulator  itself  has  access  to  a  Test 
oracle  that  will  take  a  guess  for  a  password  and  tell  the  simulator  whether  or 
not  it  matches  one  of  the  target  passwords.  Crucially,  we  require  that  when  the 
number  of  queries  made  by  the  adversary  to  the  simulator  is  q ,  the  number  of 
queries  made  by  the  simulator  to  its  Test  oracle  is  only  q/c.  This  restriction  is 
critical  to  our  proof  of  security  amplification  and  a  source  of  challenges  therein. 

Related  WORK.  Previous  work  which  aimed  at  providing  proof-based  assur¬ 
ances  for  password-based  key-derivation  has  focused  on  the  single-instance  case 
and  the  role  of  iteration  as  represented  by  the  iteration  count  c.  Our  work  focuses 
on  the  multi-instance  case  and  the  roles  of  both  salting  and  iteration. 

The  UNIX  password  hashing  algorithm  maps  a  password pw  to  EpW(0)  where 
E  is  a  blockcipher  and  0  is  a  constant.  Luby  and  Rackoff  [24]  show  this  is  a  one¬ 
way  function  when  c  =  1  and  pw  is  a  random  blockcipher  key.  (So  their  result 
does  not  really  cover  passwords.)  Wagner  and  Goldberg  [36]  treat  the  more 
general  case  of  arbitrary  c  and  keys  that  are  passwords,  but  the  goal  continues 
to  be  to  establish  one-wayness  and  no  security  amplification  (meaning  increase  in 
security  with  c)  is  shown.  Boyen  [8, 9]  suggests  various  ways  to  enhance  security, 
including  letting  users  pick  their  own  iteration  counts. 

Yao  and  Yin  [38]  give  a  natural  pseudorandomness  definition  of  a  KDF  in 
which  the  attacker  gets  ( K ,  sa)  where  K  is  either  Hc(pw\\sa )  or  a  random  string 
of  the  same  length  and  must  determine  which.  Modeling  H  as  a  random  oracle 
(RO)  [7]  to  which  the  adversary  makes  q  queries,  they  claim  to  prove  that  the 
adversary’s  advantage  is  at  most  q/cN  plus  a  negligible  term.  This  would  es¬ 
tablish  single-instance  security  amplification  by  showing  that  iteration  works  as 
expected  to  increase  attacker  effort.6  However,  even  though  salts  are  considered, 

6  Unfortunately,  we  point  in  [6]  to  a  bug  in  the  proof  of  [38,  Lemma  2.2]  and  explain 
why  the  bound  claimed  by  [38,  Theorem  1]  is  wrong.  Beyond  this,  the  proof  makes 
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this  does  not  consider  multi-instance  security  let  alone  establish  multi-instance 
security  amplification,  and  their  definition  of  KDF  security  does  not  adapt  to 
allow  this.  (We  use,  as  indicated  above,  an  indifferentiability-style  definition.)  In 
fact  the  KDF  definition  of  [38]  is  not  even  sufficient  to  establish  si  security  of 
password-based  encryption  in  the  case  the  latter,  as  specified  in  PKCS#5,  picks 
a  fresh  salt  for  each  message  encrypted.  Kelsey,  Schneier,  Hall  and  Wagner  [21] 
look  into  the  time  for  password-recovery  attacks  for  different  choices  of  KDFs. 

KDFs  are  for  use  in  non-interactive  settings  like  encryption  with  WinZip. 
The  issues  and  questions  we  consider  do  not  arise  with  password  authenticated 
key  exchange  (PAKE)  [5, 10, 14]  where  definitions  already  guarantee  that  the 
session  key  may  be  safely  used  for  encryption.  There  are  no  salts  and  no  ampli¬ 
fication  issues.  Abadi  and  Warinschi  [1]  provide  a  si,  key-recovery  definition  for 
PBE  security  and  connect  this  with  symbolic  notions.  They  do  not  consider  mi 
security.  Dodis,  Gennaro,  Hastad,  Krawczyk  and  Rabin  [12]  treat  statistically- 
secure  key  derivation  using  hash  functions  and  block  ciphers.  As  discussed  in- 
depth  by  Kracwzyk  [23] ,  these  results  and  techniques  aren’t  useful  for  password- 
based  KDFs  because  passwords  aren’t  large  enough,  let  alone  have  the  sufficient 
amount  of  min-entropy.  Krawczyk  [23]  also  notes  that  his  two-stage  KDF  ap¬ 
proach  could  be  used  to  build  password-based  KDFs  by  replacing  the  extraction 
stage  with  a  key-stretching  operation.  Our  general  framework  may  be  used  to 
analyze  the  mi-security  of  this  construction. 

Work  on  direct  product  theorems  and  XOR  lemmas  (eg.  [37,15,18,13,27]) 
has  considered  the  problem  of  breaking  multiple  instances  of  a  cryptographic 
primitive,  in  general  as  an  intermediate  step  to  amplifying  security  in  the  single¬ 
instance  setting.  Mi-Xor-security  is  used  in  this  way  in  [13,27]. 

2  The  Multi-Instance  Terrain 

This  section  defines  metrics  of  mi-secure  encryption  and  explores  the  relations 
between  them  to  establish  the  notions  and  results  summarized  in  Figure  1.  Our 
treatment  intends  to  show  that  the  mi  terrain  is  different  from  the  si  one  in 
fundamental  ways,  leading  to  new  definitions,  challenges  and  connections. 

Syntax.  Recall  that  a  symmetric  encryption  scheme  is  a  triple  of  algorithms 
SE  =  (K.,£,V).  The  key  generation  algorithm  /C  outputs  a  key.  The  encryp¬ 
tion  algorithm  £  takes  a  key  K  and  a  message  M  and  outputs  a  ciphertext 
C  t— *  £(K ,  M).  The  deterministic  decryption  algorithm  V  takes  AT  and  a  cipher- 
text  C  to  return  either  a  string  or  _L.  Correctness  requires  that  2? (AT,  £(AT,  M))  = 
M  for  all  M  with  probability  1  over  AT  /C  and  the  coins  of  £. 

To  illustrate  the  issues  and  choices  in  defining  mi  security,  we  start  with  key 
unrecoverability  which  is  simple  because  it  is  underlain  by  a  computational  game 
and  its  mi  counterpart  is  easily  and  uncontentiously  defined.  When  we  move  to 


some  rather  large  and  not  fully  justified  jumps.  The  special  case  m  =  1  of  our 
treatment  will  fill  these  gaps  and  recover  the  main  claim  of  [38]. 
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main  UKU^E  m 

proc.  Enc(i,M)  proc.  Cor(i) 

K[l], . . .  ,K[m]  <-$£;  K'  <-$MEnc 

Ret  K'  =  K 

Ret  £(K.[i\,  M)  Ret  K[i] 

main  LORXj|  m 

main  AND^  m 

proc.  Enc(i,Mo,Mi) 

proc.  Cor(i) 

K[l], ....  K[m]  <— $  K. 
be*  (0,  l}m 
b '  _4Enc 

Ret  ( b '  =  ffiib[i]) 

K[l], . . . ,  K[m]  <— *  AC 
b  t-s  {0,  l}m 
b'  ytEnc 

Ret  (b'  =  b) 

If  |M0|  +  |Mi| 
then  Ret  _L 
Ce»f(K[i],Mb[i]) 

Ret  C 

Ret  (K[i],b[i]) 

main  RORXj^  m 

proc.  Enc(i,  M) 

proc.  Cor(i) 

K[l], ....  K[m]  <— $  ({0,  l}fc)m  Cl  <-*£(K[i],M) 

b  <—$  {0,  l}m;  b'  <— $  .4Enc  Mo  Id  1}|M|;  Co  t— *  £ (K[i],  Mo) 

Ret  (6'  =  ®ib[i])  Ret  Cb[i] 

Ret  (K[i],b[i]) 

Fig.  2.  Multi  instance  security  notions  for  encryption. 


stronger  notions  underlain  by  decisional  games,  definitions  will  get  more  difficult 
and  more  contentious  as  more  choices  will  emerge. 

UKU.  Single-instance  key  unrecoverability  is  formalized  via  the  game  KUse 
where  a  target  key  K  4— *  AC  is  initially  sampled,  and  the  adversary  A  is  given  an 
oracle  Enc  which,  on  input  M,  returns  £(K ,  M).  Finally,  the  adversary  is  asked 
to  output  a  guess  K'  for  the  key,  and  the  game  returns  true  if  K  =  I<\  and 
false  otherwise.  An  mi  version  of  the  game,  UKUsE.m,  is  depicted  in  Figure  2. 
It  picks  an  to- vector  K  of  target  keys  and  the  oracle  Enc  now  takes  i,M  to 
return  £(K The  Cor  oracle  gives  the  adversary  the  capability  of  cor¬ 
rupting  a  user  to  obtain  its  target  key.  The  adversary’s  output  guess  is  also  a 
to- vector  K'  and  the  game  returns  the  boolean  (K  =  K7),  meaning  the  adver¬ 
sary  wins  only  if  it  recovers  all  the  target  keys.  (The  “U”  in  “UKU”  reflects 
this,  standing  for  “Universal.”)  The  advantage  of  adversary  A  is  Advsj)um(A)  = 
Pr[UKUsE  m  =>  true].  Naturally,  this  advantage  depends  on  the  adversary’s  re¬ 
sources.  (It  could  be  1  if  the  adversary  corrupts  all  instances.)  We  say  that  A 
is  a  ( t ,  q,  (7c)-adversary  if  it  runs  in  time  t  and  makes  at  most  q[i]  encryption 
queries  of  the  form  Enc(i,  •)  and  makes  at  most  qc  corruption  queries.  Then 
we  let  Advg£Um(t,  q,  qc)  —  max_4  Adv5^um(A)  where  the  maximum  is  over  all 
( t ,  q,  ^-adversaries. 

AND.  Single- instance  indistinguishabilty  for  symmetric  encryption  is  usually 
formalized  via  left-or-right  security  [4] .  A  random  bit  b  and  key  K  4—s  AC  are 
chosen,  and  an  adversary  A  is  given  access  to  an  oracle  Enc  that  given  equal- 
length  messages  M0,  Mi  returns  £(K.  Mb).  The  adversary  outputs  a  bit  b'  and  its 
advantage  is  2  Pr[6  =  b']  —  1.  There  are  several  ways  one  might  consider  creating 
an  mi  analog.  Let  us  first  consider  a  natural  AND-based  metric  based  on  game 
ANDsE.m  of  Figure  2.  It  picks  at  random  a  vector  b  *  {0,  l}m  of  challenge  bits 
as  well  as  a  vector  K[l], . . . ,  K[to]  of  keys,  and  the  adversary  is  given  access  to 
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oracle  Enc  that  on  input  where  \M0\  =  |Mi|,  returns  £(K.[i],Mh\A). 

Additionally,  the  corruption  oracle  Cor  takes  i  and  returns  the  pair  (K[i] ,  b  [z] ) . 
The  adversary  finally  outputs  a  bit  vector  t/,  and  wins  if  and  only  if  b  =  b'. 
(It  is  equivalent  to  test  that  b[z]  =  b'[i]  for  all  uncorrupted  i.)  The  advantage 
of  adversary  A  is  Adv|EdTO(A)  =  PrfAND^  m  =>■  true]  —  2~m.  We  say  that  A 
is  a  ( t ,  q,  (7c)-adversary  if  it  runs  in  time  t  and  makes  at  most  q[i]  encryption 
queries  of  the  form  Enc(i,  •,  •)  and  makes  at  most  qc  corruption  queries.  Then 
we  let  Adv^  m(t,  q,  qc)  =  max^  AdvsEdm(A)  where  the  maximum  is  over  all 
(t,  q,  ^-adversaries. 

This  metric  has  many  points  in  its  favor.  By  (later)  showing  that  security 
under  it  is  implied  by  security  under  our  preferred  LORX  metric,  we  automat¬ 
ically  garner  whatever  value  it  offers.  But  the  AND  metric  also  has  weaknesses 
that  in  our  view  make  it  inadequate  as  the  primary  choice.  Namely,  it  does  not 
capture  the  hardness  of  breaking  all  the  uncorrupted  instances.  For  example,  an 
adversary  that  corrupts  instances  1, . . .  ,m  —  1  to  get  b[l], . . . ,  b[ro  —  1],  makes 
a  random  guess  g  for  b[m]  and  returns  (b[l], . . . ,  b[m  —  1],  g)  has  the  high  ad¬ 
vantage  0.5  —  2~m  without  breaking  all  instances.  We  prefer  a  metric  where  this 
adversary’s  advantage  is  close  to  0. 

LORX.  To  overcome  the  above  issue  with  the  AND  advantage,  we  introduce 
the  XOR  advantage  measure  and  use  it  to  define  LORX.  Game  LORXsE.m  of 
Figure  2  makes  its  initial  choices  the  same  way  as  game  ANDse,™  and  provides 
the  adversary  with  the  same  oracles.  However,  rather  than  a  vector,  the  adversary 
must  output  a  bit  6',  and  wins  if  this  equals  b[l]©  •  •  •  ©b[m].  (It  is  equivalent 
to  test  that  b'  =  ©jgsb[i]  where  S  is  the  uncorrupted  set.)  The  advantage  of 
adversary  A  is  Adv^m  (A)  =  2Pr[LORXglEm  =>■  true]  —  1.  We  say  that  A  is  a 
(t,  q,  ^-adversary  if  it  runs  in  time  t  and  makes  at  most  q[i]  encryption  queries 
of  the  form  Enc(i,  •,  •)  and  makes  at  most  qc  corruption  queries.  Then  we  let 
Adv5gxm  (t,  q,  qc)  =  max_4  Advl^'^A)  where  the  maximum  is  over  all  (f,  q,  qc)~ 
adversaries.  In  the  example  we  gave  for  AND,  if  an  adversary  corrupts  the  first 
to  —  1  instances  to  get  back  b[l], . . . ,  b[m  —  1],  makes  a  random  guess  g  for  b[m] 
and  outputs  b'  =  b[l]ffi  •  •  •  ©b[m.  —  1]©</,  it  will  have  advantage  0. 

RORX.  A  variant  of  the  si  LOR  notion,  ROR,  was  given  in  [4].  Here  the  ad¬ 
versary  must  distinguish  between  an  encryption  of  a  message  M  it  provides  and 
the  encryption  of  a  random  message  of  length  \M\.  This  was  shown  equi valent 
to  LOR  up  to  a  factor  2  in  the  advantages  [4].  This  leads  us  to  define  the  mi 
analog  RORX  and  ask  how  it  relates  to  LORX.  Game  RORXse,™  of  Figure  2 
makes  its  initial  choices  the  same  way  as  game  LORXsE.m-  The  adversary  is 
given  access  to  oracle  Enc  that  on  input  i,  M,  returns  £(K[i],  M)  if  b[i]  =  1  and 
otherwise  returns  £(K[i],  M\)  where  M\  •<— s  {0, 1}IML  It  also  gets  the  usual  Cor 
oracle.  It  outputs  a  bit  b'  and  wins  if  this  equals  b[l]©  •  •  •  ® b [to] .  The  advantage 
of  adversary  A  is  Advl^^A)  =  2Pr[RORXglE  TO  =>  true]  —  1.  We  say  that  A 
is  a  (t,  q,  (7c)-adversary  if  it  runs  in  time  t  and  makes  at  most  q[i]  encryption 
queries  of  the  form  Enc(*,  •)  and  makes  at  most  qc  corruption  queries.  Then 
we  let  Adv^'^f,  q,  qc)  =  max_4  Advl^^A)  where  the  maximum  is  over  all 
(t,  q,  (7c)-adversaries. 
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DISCUSSION.  The  multi-user  security  goal  from  [3]  gives  rise  to  a  version  of 
the  above  games  without  corruptions  and  where  all  instances  share  the  same 
challenge  bit  b ,  which  the  adversary  tries  to  guess.  But  this  does  not  measure 
mi  security,  since  recovering  a  single  key  suffices  to  learn  b. 

The  above  approach  extends  naturally  to  providing  a  mi  counterpart  to  any 
security  definition  based  on  a  decisional  game,  where  the  adversary  needs  to 
guess  a  bit  b.  For  example  we  may  similarly  create  mi  metrics  of  CCA  security. 

Why  does  the  model  include  corruptions?  The  following  example  may  help 
illustrate.  Suppose  SE  is  entirely  insecure  when  the  key  has  first  bit  0  and  highly 
secure  otherwise.  (From  the  si  perspective,  it  is  insecure.)  In  the  LORX  game,  an 
adversary  will  be  able  to  figure  out  around  half  the  challenge  bits.  If  we  disallow 
corruptions,  it  would  still  have  very  low  advantage.  From  the  application  point 
of  view,  this  seems  to  send  the  wrong  message.  We  want  LORX-security  to 
mean  that  the  probability  of  “large  scale”  damage  is  low.  But  breaking  half  the 
instances  is  pretty  large  scale.  Allowing  corruptions  removes  this  defect  because 
the  adversary  could  corrupt  the  instances  it  could  not  break  and  then,  having 
corrupted  only  around  half  the  instances,  get  a  very  high  advantage,  breaking 
LORX-security.  In  this  way,  we  may  conceptually  keep  the  focus  on  an  adversary 
goal  of  breaking  all  instances,  yet  cover  the  case  of  breaking  some  threshold 
number  via  the  corruption  capability. 

An  alternative  way  to  address  the  above  issue  without  corruptions  is  to  define 
threshold  metrics  where  the  adversary  wins  by  outputting  a  dynamically  chosen 
set  S  and  predicting  the  xor  of  the  challenge  bits  for  the  indexes  in  S.  This, 
again,  has  much  going  for  it  as  a  metric.  But  LORX  with  corruptions,  as  we 
define  it,  will  imply  security  under  this  metric. 

LORX  IMPLIES  UKU.  In  the  si  setting,  it  is  easy  to  see  that  LOR  security 
implies  KU  security.  The  LOR  adversary  simply  runs  the  KU  adversary.  When 
the  latter  makes  oracle  query  M,  the  LOR  adversary  queries  its  own  oracle  with 
M,  M  and  returns  the  outcome  to  the  KU  adversary.  When  the  latter  returns 
a  key  I\' ,  the  LOR  adversary  submits  a  last  oracle  query  consisting  of  a  pair 
Mo,  Mi  of  random  messages  to  get  back  a  challenge  ciphertext  C,  returning  1 
if  T>(K',C)  =  Mi  and  0  otherwise.  A  similar  but  slightly  more  involved  proof 
shows  that  ROR  implies  KU. 

It  is  important  to  establish  analogs  of  these  basic  results  in  the  mi  setting,  for 
they  function  as  “tests”  for  the  validity  of  our  mi  notions.  The  following  shows 
that  LORX  security  implies  UKU.  Interestingly,  it  is  not  as  simple  to  establish 
in  the  mi  case  as  in  the  si  case.  Also,  as  we  will  see  later,  the  proof  that  RORX 
implies  UKU  is  not  only  even  more  involved  but  incurs  a  factor  2m  loss,  making 
LORX  a  better  choice  as  the  metric  to  target  in  designs. 

Theorem  1.  [LORX  =>-  UKU]  Let  SE  =  {K,,£,V)  be  a  symmetric  encryption 
scheme  with  message  space  Ml,  and  let  i  be  such  that  {0,1}^  C  Ml.  Then,  for 
all  t,  qc,  and  q,  and  for  all  k  >  1, 

AdvSEUm(^  9c)  <  AdvsE^t',  q\  9c)  +  m  •  ^  ^  ^  , 
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where  t'  =  t  +  0(m  ■  k),  and  q'[i]  =  q[i]  +  k  for  all  i  =  1, . . . ,  to.  | 

The  proof  is  given  in  [6].  Here,  let  us  stress  Theorem  1  surfaces  yet  another 
subtlety  of  the  mi  setting.  At  first,  it  would  seem  that  proving  the  case  k  =  1 
of  the  theorem  is  sufficient  (this  is  what  usually  done  in  the  si  case).  However, 
it  is  crucial  to  remark  that  Adv^'^f',  q',  qc)  may  be  very  small.  For  example, 
it  is  not  unreasonable  to  expect  2_128'm  if  SE  is  secure  in  the  single-instance 
setting.  Yet,  assume  that  £  encrypts  128-bit  messages,  then  we  are  only  able  to 
set  £  =  128,  in  turn  making  m/(2e  —  1)  «  m  •  2-128  by  far  the  leading  term  on 
the  right-hand  side.  The  parameter  k  hence  opens  the  door  to  fine  tuning  of  the 
additive  extra  term  at  the  cost  of  an  additive  complexity  loss  in  the  reduction. 
Also  note  that  the  reduction  in  the  proof  of  Theorem  1  is  not  immediate,  as 
an  adversary  guessing  all  the  keys  in  the  UKU  game  with  probability  e  only 
yields  an  adversary  recovering  all  the  bits  b[l], . . . ,  b[m]  in  the  LORX  game 
with  probability  e.  Just  outputting  the  xor  of  these  bits  is  not  sufficient,  as 
we  have  to  boost  the  success  probability  to  in  order  to  obtain  the  desired 
relation  between  the  two  advantage  measures. 

In  analogy  to  the  si  setting,  UKU  does  not  imply  LORX.  Just  take  a  scheme 
SE  =  (K.,£,V)  encrypting  n-bit  messages  which  is  UKU-secure,  and  modify  it 
into  a  scheme  SE'  =  (/C' ,  £'  ,V)  where  K  —  K!  and  £' ( K,  M)  =  £' (K ,  M)  ||  M [0], 
with  M[ 0]  being  the  first  bit  of  M.  Clearly,  SE'  is  still  UKU-secure  but  not 
LORX-secure 

As  indicated  above,  a  proof  that  RORX  implies  UKU  is  much  more  involved 
and  incurs  a  factor  2m  loss.  Roughly  speaking,  this  is  because  in  the  si  case,  in 
the  reduction  needed  to  prove  that  ROR  implies  KU,  the  ROR  adversary  can 
only  simulate  the  execution  of  the  KU  adversary  correctly  in  the  case  where  the 
bit  is  1,  i.e.,  the  encryption  oracle  returns  the  actual  encryption  of  a  message. 
This  results  in  a  factor  two  loss  in  terms  of  advantage.  Upon  translating  this 
technique  to  the  mi  case,  the  factor  2  becomes  2m,  as  all  bits  need  to  be  1  for 
the  UKU  adversary  to  output  the  right  keys  with  some  guaranteed  probability. 
However,  we  will  not  follow  this  route  for  the  proof  of  this  result.  Instead,  we 
can  obtain  the  same  result  by  combining  Theorem  2  and  Theorem  1. 

LORX  VERSUS  RORX.  In  the  si  setting,  LOR  and  ROR  are  the  same  up  to  a 
factor  2  in  the  advantage  [4].  The  LOR  implies  ROR  implication  is  trivial  and 
ROR  implies  LOR  is  a  simple  hybrid  argument.  We  now  discuss  the  relation 
between  the  mi  counterparts,  namely  RORX  and  LORX,  which  is  both  more 
complex  and  more  challenging  to  establish. 

Theorem  2.  [RORX  LORX]  Let  SE  =  (1C ,£,T>)  be  a  symmetric  encryption 
scheme.  For  all  m,  t,qc  >  0,  and  all  vectors  q  we  have  Advlf^xm(t,q,qc)  < 
2m  ■  Adv^^t',  q,  qc),  where  t'  =t  +  0(1).  \ 

As  discussed  in  Section  1,  the  multiplicative  factor  2m  is  often  of  no  harm  because 
advantages  are  already  exponentially  small  in  m.  The  factor  is  natural,  being  the 
mi  analogue  of  the  factor  2  appearing  in  the  traditional  si  proof,  and  examples 
can  be  given  showing  that  the  bound  is  tight.  The  proof  of  the  above  is  in  [6]. 
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The  difficulty  is  adapting  the  hybrid  argument  technique  to  the  mi  setting.  We 
omit  the  much  simpler  proof  of  the  converse: 

Theorem  3.  [LORX  =>  RORX]  Let  SE  =  (K.,£,T>)  be  a  symmetric  encryption 
scheme.  For  all  m,  t,qc  >  0,  and  all  vectors  q  we  have  Advg^^f,  q,  qc)  < 
Advs°Er^(t',  q,  qc),  where  t'  =  t  +  0(  1).  I 

LORX  implies  AND.  Intuitively,  one  might  expect  AND  security  to  be  a 
stronger  requirement  than  LORX  security,  as  the  former  seems  easier  to  break 
than  the  latter.  However  we  show  that  under  a  fairly  minimal  requirement, 
LORX  implies  AND.  This  brings  another  argument  in  support  of  LORX:  Even 
if  an  application  requires  AND  security,  it  turns  out  that  proving  LORX  security 
is  generally  sufficient.  The  following  theorem  is  to  be  interpreted  as  follows:  In 
general,  if  we  only  know  that  Advi^^i,  q,  qc)  is  small,  we  do  not  know  how 
to  prove  AdvsEdm(<'.  q,  qc)  is  also  small  (for  t'  «  t),  or  whether  this  is  true  at 
all.  As  we  sketched  above,  the  reason  is  that  we  do  not  know  how  to  use  an  ad¬ 
versary  A  for  which  the  AND$E,m  advantage  is  large  to  construct  an  adversary 
for  which  the  LORXsE.m  advantage  is  large.  Still,  one  would  expect  that  such 
an  adversary  might  more  easily  yield  one  for  which  the  LORXsE.fc  advantage  is 
sufficiently  large,  for  some  k  <  m.  The  following  theorem  uses  a  probabilistic 
lemma  due  to  Unger  [35]  to  confirm  this  intuition. 

Theorem  4.  Let  SE  =  (K,,£,V)  be  a  symmetric  encryption  scheme.  Further, 
let  m,  t,  q,  and  qc  be  given,  and  assume  that  there  exist  C,  e,  and  7  such  that 
for  all  1  <  i  <  in, 

max,  .  .  Advs°E u(t*Sl  qfS1],  qc)  <  C  ■  el  +  7  , 

where  q[S]  is  the  projection  of  q  on  the  components  in  S,  and  t*s  =  t  +  0(tg  ■ 
q[*])i  t-£  denoting  the  running  time  needed  for  one  encryption  with  £. 

Then,  Adv|jj dm(t,q,qc)  <  7  +  C  •  +  e*)/2-  I 

We  are  not  able  to  prove  that  the  converse  (AND  implies  LORX)  is  true  in 
general,  but  in  the  absence  of  corruptions  one  can  upper  bound  Adv^^t,  q,  0) 
in  terms  of  Adv5Edm,  (t? ,  q',  0)  for  m!  «  2m  and  t'  and  q'  being  much  larger  than 
t,  q.  The  proof,  which  we  omit,  follows  the  lines  of  the  proof  of  the  XOR  Lemma 
from  the  Direct  Product  Theorem  given  by  [18],  and  relies  on  the  Goldreich- 
Levin  theorem  [17].  As  the  loss  in  concrete  security  in  this  reduction  is  very 
large,  and  it  only  holds  in  the  corruption-free  case,  we  find  this  an  additional 
argument  to  support  the  usage  of  the  LORX  metric. 

3  Password-based  Encryption  via  KDFs 

We  now  turn  to  our  main  motivating  application,  that  of  password  based  encryp¬ 
tion  (PBE)  as  specified  in  PKCS#5  [32].  The  schemes  specified  there  combine 
a  conventional  mode  of  operation  (e.g.,  CBC  mode)  with  a  password-based  key 
derivation  function  (KDF).  We  start  with  formalizing  the  latter. 
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Password-based  KDFs.  Formally,  a  (/:,  s,  c)-KDF  is  a  deterministic  map  KD 
:  {0,1}*  x  {0, 1}S  — >  {0,  l}fc  that  may  make  use  of  an  underlying  ideal  primitive. 
Here  c  is  the  iteration  count,  which  specifies  the  multiplicative  increase  in  work 
that  should  slow  down  brute  force  attacks. 

PKCS#5  describes  two  KDFs  [32].  We  treat  the  first  in  detail  and  discuss 
the  second  in  [6].  Let  KD1H  (pw,  sa)  =  Hc(pw\\sa)  where  Hc  is  the  function 
that  composes  H  with  itself  c  times.  To  generalize  beyond  concatenation,  we 
can  define  a  function  Encode(pw,  sa)  that  describes  how  to  encode  its  inputs 
onto  {0,1}*  with  efficiently  computable  inverse  Decode(IU). 

PBE  SCHEMES.  A  PBE  scheme  is  just  a  symmetric  encryption  scheme  where  we 
view  the  keys  as  passwords  and  key  generation  as  a  password  sampling  algorithm. 
To  highlight  when  we  are  thinking  of  key  generation  as  password  sampling  we  will 
use  V  to  denote  key  generation  (instead  of  1C).  We  will  also  write  pw  for  a  key  that 
we  think  of  as  a  password.  Let  KD  be  a  (k,  s,  c)-KDF  and  let  SE  =  (1C,  £,  V)  be 
an  encryption  scheme  with  1C  outputting  uniformly  selected  fc-bit  keys.  Then  we 
define  the  PBE  scheme  S£ [KD,  SE]  =  (V,  £,  V)  as  follows.  Encryption  £{pw,  M) 
is  done  via  sa  <—  *  {0, 1}S ;  I\  KD  (pw,  sa) ;  C  <—s£(K,  M),  returning  (sa,  C)  as 
the  ciphertext.  Decryption  recomputes  the  key  K  by  reapplying  the  KDF  and 
then  applies  V.  If  the  KDF  is  KD1  and  the  encryption  scheme  is  CBC  mode, 
then  one  obtains  the  first  PBE  scheme  from  PKCS#5  [32]. 

PASSWORD  guessing.  We  aim  to  show  that  security  of  the  above  constructions 
holds  up  to  the  amount  of  work  required  to  brute-force  the  passwords  output 
by  V .  This  begs  the  question  of  how  we  measure  the  strength  of  a  password 
sampler.  We  will  formalize  the  hardness  of  guessing  passwords  output  by  some 
sampler  V  via  an  adaptive  guessing  game:  It  challenges  an  adversary  with  guess¬ 
ing  passwords  adaptively  in  a  setting  where  the  attacker  may,  also,  adaptively 
learn  some  passwords  via  a  corruption  oracle.  Concretely,  let  GUESS-p>m  be 
the  game  defined  in  Figure  3.  A  (qt,qc)~  guessing  adversary  is  one  that  makes 
at  most  qt  queries  to  Test  and  qc  queries  to  Cor.  An  adversary  B’s  guessing 
advantage  is  Adv|,u^s(£>)  =  Pr  [GUESS®  m  =>  true].  We  assume  without  loss 
of  generality  that  A  does  not  make  any  pointless  queries :  (1)  repeated  queries 
to  Cor  on  the  same  value;  (2)  a  query  Test(i,  •)  following  a  query  of  Cor(i); 
and  (3)  a  query  Cor(i)  after  a  query  Test(i,pw)  that  returned  true.  We  also 
define  a  variant  of  the  above  guessing  game  that  includes  salts  and  allows  an 
attacker  to  test  password-salt  pairs  against  all  m  instances  simultaneously.  This 
will  be  useful  as  an  intermediate  step  when  reducing  to  guessing  advantage. 
The  game  saGUESS-pirn,jP  is  shown  in  Figure  3  and  we  define  advantage  via 
AdVp"^uess(H)  =  Pr  [saGUESS®  m  =>  true] .  An  easy  argument  proves  the  fol¬ 
lowing  lemma. 

Lemma  5.  Let  m,  p  >  0,  let  V  be  a  password  sampler  and  let  A  be  an  (qt,qc)~ 
guessing  GUESS-pjm  adversary.  Then  there  is  a  (qt,qc)~ guessing  saGUESS-pjrrajP 
adversary  B  such  that  AdVp[^u®ss(A)  <  Adv|]]^s(H)  +  m2 p2 / 2s.  □ 

Samplers  with  high  min-entropy.  Even  though  the  guessing  advantage  pre¬ 
cisely  quantifies  strength  of  password  samplers,  good  upper  bounds  in  terms  of 
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main  GUESSp!m 

proc.  Test(2,pru) 

proc.  Cor(z) 

pw  [1] , . . .  ,  pw  [m]  4— $  V 
pw'  <_$  £Test,Cor 

Ret  AKLRpw'b]  =  pw[j]) 

If  (pw  =  pw[i])  then  Ret  true 

Ret  _L 

Ret  pw[i] 

main  saGUESS-pjmiP 

proc.  Test(jm>,  sa) 

proc.  Cor(i) 

pw  [1] , . . .  ,  pw  [m]  4— $  V 

For  2=1  to  m  do 

Ret  pw[i] 

For  2  —  1  to  vn  do 

For  j  =  1  to  p  do 

For  j  =  1  to  p  do 

If  (pw,sa)  =  (pw[i],  sa[i,  j])  then 

sa[i,  j\  <— s  {0, 1}S 

Ret  (i,  j) 

pw'  <— $  i3Test’Cor(sa) 

Ret  (_L,  _L) 

Ret  Aiiilpw'p]  =  pw[i|) 

Fig.  3.  An  adaptive  password-guessing  game. 


the  adversary’s  complexity  and  of  some  simpler  relevant  parameters  of  a  pass¬ 
word  sampler  are  desirable.  One  interesting  case  is  samplers  with  high  min- 
entropy.  Formally,  we  say  that  V  has  min-entropy  /i  if  for  all  pw'  it  holds  that 
Pr[pw  =  pw']  <  2_M  over  the  coins  used  in  choosing  pw  <—  $  V . 

Theorem  6.  Fix  m  >  qc  >  0  and  a  password  sampler  V  with  min-entropy 
/i.  Let  B  be  a  (qt,qc) -adversary  for  GUESS-pjm  making  qi  queries  of  the  form 
Test(t,  •)  with  qt  =  qi  +  ■  ■  ■  +  qm-  Let  S  =  qt/{m2 and  let  7  =  (m  —  qc)/m. 
Then  Adv^™s(B)  <  e~mA ^  where  A(-y,  6)  =  7ln(J)  +  (1  -  7)  ln(^J).  □ 

Using  Z\( 7,  (5)  >  2(7— (5)2,  we  see  that  to  win  the  guessing  game  for  qc  corruptions, 
qt  ~  (m-qc)- 2M  Test  queries  are  necessary,  and  the  brute-force  attack  is  optimal. 
Note  that  the  above  bound  is  the  best  we  expect  to  prove:  Indeed,  assume  for  a 
moment  that  we  restrict  ourselves  to  adversaries  that  want  to  recover  a  subset 
of  m  —  qc  passwords,  without  corruptions,  and  make  qt/m  queries  Test(i,  •),  for 
each  i,  which  are  independent  from  queries  Test(j,  •)  for  other  j  ^  i.  Then,  each 
individual  password  is  found,  independently,  with  probability  at  most  qt/(m-2^), 
and  if  one  applies  the  Chernoff  bound,  the  probability  that  a  subset  of  size  m  —  qc 
of  the  passwords  are  retrieved  is  upper  bounded  by  e~mA<-7,5\  In  our  case,  we 
have  additional  challenges:  Foremost,  queries  for  each  i  are  not  independent. 
Also,  the  number  of  queries  may  not  be  the  same  for  each  index  i.  And  finally, 
we  allow  for  corruption  queries. 

The  full  proof  of  Theorem  6  is  given  in  [6].  At  a  high  level,  it  begins  by 
showing  how  to  move  to  a  simpler  setting  in  which  the  adversary  wins  by  re¬ 
covering  a  subset  of  the  passwords  without  the  aid  of  a  corrupt  oracle.  The 
resulting  setting  is  an  example  of  a  threshold  direct  product  game.  This  allows 
us  to  apply  a  generalized  Chernoff  bound  due  to  Panconesi  and  Srinivasan  [31] 
(see  also  [20])  that  reduces  threshold  direct  product  games  to  (non-threshold) 
direct  product  games.  Finally,  we  apply  an  amplification  lemma  due  to  Maurer, 
Pietrzak,  and  Renner  [25]  that  yields  a  direct  product  theorem  for  the  pass¬ 
word  guessing  game.  Let  us  also  note  that  using  the  same  technique,  the  better 
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bound  Adv^u^s(£?)  <  (qt/m2fl)rn  can  be  proven  for  the  special  case  of  (gt,0)- 
adversaries. 


Correlated  passwords.  By  taking  independent  samples  from  V  we  have  cap¬ 
tured  only  the  setting  of  independent  passwords.  In  practice,  of  course,  passwords 
may  be  correlated  across  users  or,  at  least,  user  accounts.  Our  results  extend  to 
the  setting  of  jointly  selecting  a  vector  of  m  passwords,  except  of  course  the 
analysis  of  the  guessing  advantage  (whose  proof  fundamentally  relies  upon  in¬ 
dependence).  This  last  only  limits  our  ability  to  measure,  in  terms  of  simpler 
metrics  like  min-entropy,  the  difficulty  of  a  guessing  game  against  correlated 
passwords.  This  does  not  decrease  the  security  proven,  as  the  simulation-based 
paradigm  we  introduce  below  allows  one  to  reduce  to  the  full  difficulty  of  the 
guessing  game. 


Simulation-based  Security  for  KDFs.We  define  an  ideal-functionality  style 
notion  of  security  for  KDFs.  Figure  4  depicts  two  games.  A  message  sampler  M  is 
an  algorithm  that  takes  input  a  number  r  and  outputs  a  pair  of  vectors  (pw,  sa) 
each  having  r  elements  and  with  |sa[i]|  =  s  for  1  <  i  <  r.  A  simulator  S  is 
a  randomized,  stateful  procedure.  It  expects  oracle  access  to  a  procedure  Test 
to  which  it  can  query  a  message.  Game  RealKD,A4,r  gives  a  distinguisher  V  the 
messages  and  associated  derived  keys.  Also,  V  can  adaptively  query  the  ideal 
primitive  H  underlying  KD.  Game  Ideals, M,r  gives  V  the  messages  and  keys 
chosen  uniformly  at  random.  Now  T>  can  adaptively  query  a  primitive  oracle 
implemented  by  a  simulator  S  that,  itself,  has  access  to  a  Test  oracle.  Then  we 
define  KDF  advantage  by 


Adv“f^r(P,S)  =  Pr 


Real, 


v 

KD, M,r 


—  Pr 


Ideal 


v 

S,M,r 


To  be  useful,  we  will  require  proving  that  there  exists  a  simulator  S  such  that 
for  any  V ,  Xi  pair  the  KDF  advantage  is  “small” . 

This  notion  is  equivalent  to  applying  the  indifferentiability  framework  [26] 
to  a  particular  ideal  KDF  functionality.  That  functionality  chooses  messages 
according  to  an  algorithm  Jvi  and  outputs  on  its  honest  interface  the  messages 
and  uniform  keys  associated  to  them.  On  the  adversarial  interface  is  the  test 
routine  which  allows  the  simulator  to  learn  keys  associated  to  messages.  This 
raises  the  question  of  why  not  just  use  indifferentiability  from  a  RO  as  our 
target  security  notion.  The  reasons  are  two-fold.  First,  it  is  not  clear  that  Hc 
is  indifferentiable  from  a  random  oracle.  Second,  even  if  it  were,  a  proof  would 
seem  to  require  a  simulator  that  makes  at  least  the  same  number  of  queries 
to  the  RO  as  it  receives  from  the  distinguisher.  This  rules  out  showing  security 
amplification  due  to  the  iteration  count  c.  Our  approach  solves  both  issues,  since 
we  will  show  KDF  security  for  simulators  that  make  one  call  to  Test  for  every  c 
made  to  it.  For  example,  our  simulator  for  KD1  will  only  query  Test  if  a  chain  of 
c  hashes  leads  to  the  being-queried  point  X  and  this  chain  is  not  a  continuation 
of  some  longer  chain.  We  formally  capture  this  property  of  simulators  next. 


c-AMPLIFYING  SIMULATORS.  Let  r  =  {X\,  Yi), . . . ,  (Xq,  Yq)  be  a  (possibly  par¬ 
tial)  transcript  of  Prim  queries  and  responses.  We  restrict  attention  to  (k,  s,  c)- 
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main  RealKD,A1,r 
(pw,  sa)  4—*  _M(r) 

For  i  =  1  to  r  do 

K[i]  4—*  KD //  f  pw[i] .  ga[i] ) 
b'  «-*  x>Prim(pw,  sa.  K) 

Ret  b1 

proc.  Prim(.Y) 

Ret  H(X) 


main  Ideals,^ 

sub.  Test  (pit;,  sa) 

(pw,  sa)  •<— $  Ad(r) 

For  i  =  1  to  r  do 

For  i  =  1  to  r  do 

If  (pw[i], sa[i])  =  ( pw,sa ) 

K[i]  4-*  {0,  l}fc 

then  Ret 

b'  4— s  X>Prim(pw,  sa,  K) 

Ret  _L 

Ret  b' 

proc.  Prim(X) 

Ret  STest(X) 

Fig.  4.  Games  for  the  simulation-based  security  notion  for  KDFs. 


KDFs  for  which  we  can  define  a  predicate  finalKD(Ai,  r)  which  evaluates  to  true 
if  there  exists  exactly  one  sequence  of  c  indices  j\  <  ■  ■  ■  <  jc  such  that  (1)  jc  =  i, 
(2)  there  exist  unique  ( pw ,  sa)  such  that  evaluating  KD H  (pw,  sa)  when  H  is  such 
that  Yj  =  H (Xj)  for  1  <  j  <  i  results  exactly  in  the  queries  Xj1 , ,  Xjc  in  any 
order  where  Xi  is  the  last  query,  and  (3)  finalKDp^,.) t)  =  false  for  all  r  <  c. 

Our  simulators  only  query  Test  on  queries  X,  for  which  final|<D(Ai,  t)  =  true; 
we  call  such  queries  KD -completion  queries  and  simulators  satisfying  this  are 
called  c-amplifying.  Note  that  (3)  implies  that  there  are  at  most  q/c  total  KD- 
completion  queries  in  a  g-query  transcript. 

Hash-dependent  passwords.  We  do  not  allow  Xi  access  to  the  random  ora¬ 
cle  H .  This  removes  from  consideration  hash-dependent  passwords.  Our  results 
should  extend  to  cover  hash-dependent  passwords  if  one  has  explicit  domain  sep¬ 
aration  between  use  of  H  during  password  selection  and  during  key  derivation. 
Otherwise,  an  indifferentiability-style  approach  as  we  use  here  will  not  work  due 
to  limitations  pointed  out  in  [33] .  A  full  analysis  of  the  hash- dependent  password 
setting  would  therefore  appear  to  require  direct  analysis  of  PBE  schemes  without 
taking  advantage  of  the  modularity  provided  by  simulation-based  approaches. 

Security  of  KD1.  For  a  message  sampler  Xi,  let  y(Xi,r)  :=  Pr[3i  ^  j  : 
(pw[i],  sa[i])  =  (pw[j],  sa[j])]  where  (pw,  sa)  <-#A i(r).  We  prove  the  following 
theorem  in  [6] . 


Theorem  7.  Fix  r  >  0.  Let  KD1  be  as  above.  There  exists  a  simulator  S  such 
that  for  all  adversaries  V  making  q  RO  queries,  of  which  qc  are  chain  completion 
queries,  and  all  message  samplers  Xi, 


Adv™fltM>r(V,S)<47(P,r)  + 


2 r2  +  7  (2 q  +  re)2 
2n 


The  simulator  S  makes  at  most  qc  Test  queries,  and  answers  each  query  in  time 
0(c).  □ 


Security  of  PBE.  We  are  now  in  a  position  to  analyze  the  security  of  password 
based  encryption  as  used  in  PKCS#5.  The  following  theorem,  proved  in  [6], 
uses  the  multi-user  left-or-right  security  notion  from  [3]  whose  formalization  is 
recalled  in  [6]: 


14.  Multi-Instance  Security 


16 


Mihir  Bellare,  Thomas  Ristenpart,  and  Stefano  Tessaro 


Theorem  8.  Let  to  >  1,  let  <S£[KD,SE]  =  ( V,£,T> )  be  the  encryption  scheme 
built  from  an  (k,  s,  c)-KDF  KD  and  an  encryption  scheme  SE  =  (K.,£,V)  with 
k-bit  keys.  Let  A  be  an  adversary  making  p  queries  to  Enc(i ,  •,  •)  for  each 
i  €  {1,  and  making  at  most  qc  <  to  corruption  queries.  Let  S  be  a 

c-amplifying  simulator.  Then  there  exists  message  sampler  A4  and  adversaries 
V,  C,  and  B  such  that 

AdvfeGA)  <  to- Adv^plOT(C)  +  2- Adv| +  2- Adv“f)Mm,(2>, 5) 

If  A  makes  q  queries  to  H,  then:  V  makes  at  most  q  queries  to  its  H  oracle; 
B  makes  at  most  \q/c ]  queries  to  Test  and  at  most  qc  corruption  queries;  and 
C  makes  a  single  query  Enc(i ,  •,  •)  for  each  1  <  i  <  p.  Moreover,  C 's  running 
time  equals  +  q  ■  ts  plus  a  small,  absolute  constant,  and  where  t _4  is  the 
running  time  of  A,  and  ts  is  the  time  needed  by  S  to  answer  a  query.  Finally, 
j(M,mp)  <  m2p2 /2s.  □ 

Note  that  the  theorem  holds  even  when  SE  is  only  one-time  secure  (meaning 
it  can  be  deterministic),  which  implies  that  the  analysis  covers  tools  such  as 
WinZip  (c.f.,  [22]).  In  terms  of  the  bound  we  achieve,  Theorem  7  for  KD1  shows 
that  an  adversary  that  makes  Adv^,  m  (X>,  S)  large  requires  q  «  2n/2  queries 
to  H ,  provided  salts  are  large.  If  H  is  SHA-256  then  this  is  about  2128  work. 
Likewise,  a  good  choice  of  SE  will  ensure  that  Adv^U£°p(C)  will  be  very  small. 
Thus  the  dominating  term  ends  up  the  guessing  advantage  of  B  against  V ,  which 
measures  its  ability  to  guess  to  —  qc  passwords. 
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Abstract.  We  show  that  the  second  iterate  H2(M)  =  H(H(M))  of  a 
random  oracle  H  cannot  achieve  strong  security  in  the  sense  of  indiffer- 
entiability  from  a  random  oracle.  We  do  so  by  proving  that  indifferen¬ 
tiability  for  H2  holds  only  with  poor  concrete  security  by  providing  a 
lower  bound  (via  an  attack)  and  a  matching  upper  bound  (via  a  proof 
requiring  new  techniques)  on  the  complexity  of  any  successful  simulator. 
We  then  investigate  HMAC  when  it  is  used  as  a  general-purpose  hash 
function  with  arbitrary  keys  (and  not  as  a  MAC  or  PRF  with  uniform, 
secret  keys).  We  uncover  that  HMAC’s  handling  of  keys  gives  rise  to 
two  types  of  weak  key  pairs.  The  first  allows  trivial  attacks  against  its 
indifferentiability;  the  second  gives  rise  to  structural  issues  similar  to 
that  which  ruled  out  strong  indifferentiability  bounds  in  the  case  of  H2 . 
However,  such  weak  key  pairs  do  not  arise,  as  far  as  we  know,  in  any 
deployed  applications  of  HMAC.  For  example,  using  keys  of  any  fixed 
length  shorter  than  d  —  1,  where  d  is  the  block  length  in  bits  of  the  un¬ 
derlying  hash  function,  completely  avoids  weak  key  pairs.  We  therefore 
conclude  with  a  positive  result:  a  proof  that  HMAC  is  indifferentiable 
from  a  RO  (with  standard,  good  bounds)  when  applications  use  keys  of 
a  fixed  length  less  than  d  —  1. 

Keywords:  Indifferentiability,  Hash  functions,  HMAC. 


1  Introduction 

Cryptographic  hash  functions  such  as  those  in  the  MD  and  SHA  families  are 
constructed  by  extending  the  domain  of  a  fixed-input-length  compression  func¬ 
tion  via  the  Merkle-Damgard  (MD)  transform.  This  applies  some  padding  to  a 
message  and  then  iterates  the  compression  function  over  the  resulting  string  to 
compute  a  digest  value.  Unfortunately,  hash  functions  built  this  way  are  vulnera¬ 
ble  to  extension  attacks  that  abuse  the  iterative  structure  underlying  MD  [22, 34]: 
given  the  hash  of  a  message  H(M)  an  attacker  can  compute  H(M  ||  A)  for  some 
arbitrary  X ,  even  without  knowing  M. 

In  response,  suggestions  for  shoring  up  the  security  of  MD-based  hash  func¬ 
tions  were  made.  The  simplest  is  due  to  Ferguson  and  Schneier  [20],  who  advocate 
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a  hash-of-hash  construction:  H2(AI)  =  H(H(At)),  the  second  iterate  of  H.  An 
earlier  example  is  HMAC  [5],  which  similarly  applies  a  hash  function  H  twice, 
and  can  be  interpreted  as  giving  a  hash  function  with  an  additional  key  input. 
Both  constructions  enjoy  many  desirable  features:  they  use  H  as  a  black  box,  do 
not  add  large  overheads,  and  appear  to  prevent  the  types  of  extension  attacks 
that  plague  MD-based  hash  functions. 

Still,  the  question  remains  whether  they  resist  other  attacks.  More  generally, 
we  would  like  that  H2  and  HMAC  behave  like  random  oracles  (ROs).  In  this 
paper,  we  provide  the  first  analysis  of  these  functions  as  being  indifferentiable 
from  ROs  in  the  sense  of  [13,29],  which  (if  true)  would  provably  rule  out  most 
structure-abusing  attacks.  Our  main  results  surface  a  seemingly  paradoxical  fact, 
that  the  hash-of-hash  H2  cannot  be  indifferentiable  from  a  RO  with  good  bounds, 
even  if  H  is  itself  modeled  as  a  keyed  RO.  We  then  explore  the  fall  out,  which 
also  affects  HMAC. 

Indifferentiability.  Coron  et  al.  [13]  suggest  that  hash  functions  be  designed 
so  that  they  “behave  like”  a  RO.  To  define  this,  they  use  the  indifferentiability 
framework  of  Maurer  et  al.  [29].  Roughly,  this  captures  that  no  adversary  can 
distinguish  between  a  pair  of  oracles  consisting  of  the  construction  (e.g.,  H2)  and 
its  underlying  ideal  primitive  (an  ideal  hash  H)  and  the  pair  of  oracles  consisting 
of  a  RO  and  a  simulator  (which  is  given  access  to  the  RO).  A  formal  definition 
is  given  in  Section  2.  Indifferentiability  is  an  attractive  goal  because  of  the  MRH 
composition  theorem  [29] :  if  a  scheme  is  secure  when  using  a  RO  it  is  also  secure 
when  the  RO  is  replaced  by  a  hash  construction  that  is  indifferentiable  from  a 
RO.  The  MRH  theorem  is  widely  applicable  (but  not  ubiquitously,  c.f.,  [31]), 
and  so  showing  indifferentiability  provides  broad  security  guarantees. 

While  there  exists  a  large  body  of  work  showing  various  hash  constructions 
to  be  indifferentiable  from  a  RO  (c.f.,  [1,  7, 11  -13, 15, 16, 23]),  none  have  yet  ana¬ 
lyzed  either  H 2  or  HMAC.  Closest  is  the  confusingly  named  HMAC  construction 
from  [13],  which  hashes  a  message  by  computing  H2(0d  ||  Al)  where  H  is  MD 
using  a  compression  function  with  block  size  d  bits.  This  is  not  the  same  as 
HMAC  proper  nor  H2,  but  seems  close  enough  to  both  that  one  would  expect 
that  the  proofs  of  security  given  in  [13]  apply  to  all  three. 


1.1  The  Second  Iterate  Paradox 

Towards  refuting  the  above  intuition,  consider  that  H2(H{M))  =  H(H2(M)). 
This  implies  that  an  output  of  the  construction  H2(M)  can  be  used  as  an  inter¬ 
mediate  value  to  compute  the  hash  of  the  message  H{M).  This  property  does 
not  exist  in  typical  indifferentiable  hash  constructions,  which  purposefully  en¬ 
sure  that  construction  outputs  are  unlikely  to  coincide  with  intermediate  values. 
However,  and  unlike  where  extension  attacks  apply  (they,  too,  take  advantage 
of  outputs  being  intermediate  values),  there  are  no  obvious  ways  to  distinguish 
H2  from  a  RO. 

Our  first  technical  contribution,  then,  is  detailing  how  this  structural  prop¬ 
erty  might  give  rise  to  vulnerabilities.  Consider  computing  a  hash  chain  of 
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length  £  using  H2  as  the  hash  function.  That  is,  compute  Y  =  H2t{M).  Do¬ 
ing  so  requires  2£  H-applications.  But  the  structural  property  of  H 2  identified 
above  means  that,  given  M  and  Y  one  can  compute  H2i(H(M))  using  only 
one  if-application:  H{Y)  =  H(H2e(M))  =  H2e(H(M)).  Moreover,  the  val¬ 
ues  computed  along  the  first  hash  chain,  namely  the  values  Yi  -t—  H2l(M)  and 
Y(  •<—  H2l(H[M))  for  0  <  i  <  £  are  disjoint  with  overwhelming  probability 
(when  £  is  not  unreasonably  large).  Note  that  for  chains  of  RO  applications, 
attempting  to  cheaply  compute  such  a  second  chain  would  not  lead  to  disjoint 
chains.  This  demonstrates  a  way  in  which  a  RO  and  H 2  differ. 

We  exhibit  a  cryptographic  setting,  called  mutual  proofs  of  work ,  in  which 
the  highlighted  structure  of  H2  can  be  exploited.  In  mutual  proofs  of  work,  two 
parties  prove  to  each  other  that  they  have  computed  some  asserted  amount  of 
computational  effort.  This  task  is  inspired  by,  and  similar  to,  client  puzzles  [18, 
19,24,25,33]  and  puzzle  auctions  [35].  We  give  a  protocol  for  mutual  proofs  of 
work  whose  computational  task  is  computing  hash  chains.  This  protocol  is  secure 
when  using  a  random  oracle,  but  when  using  instead  H 2  an  attacker  can  cheat 
by  abusing  the  structural  properties  discussed  above. 

Indifferentiability  lower  bound.  The  mutual  proofs  of  work  example  al¬ 
ready  points  to  the  surprising  fact  that  H2  does  not  “behave  like”  a  RO.  In  fact, 
it  does  more,  ruling  out  proofs  of  indifferentiability  for  H 2  with  good  bounds. 
(The  existence  of  a  tight  proof  of  indifferentiability  combined  with  the  compo¬ 
sition  theorem  of  [29]  would  imply  security  for  mutual  proofs  of  work,  yielding 
a  contradiction.)  However,  we  find  that  the  example  does  not  surface  well  why 
simulators  must  fail,  and  the  subtletly  of  the  issues  here  prompt  further  inves¬ 
tigation.  We  therefore  provide  a  direct  negative  result  in  the  form  of  an  indif- 
ferentiability  distinguisher.  We  prove  that  should  the  distinguisher  make  qi,q2 
queries  to  its  two  oracles,  then  for  any  simulator  the  indifferentiability  advantage 
of  the  distinguisher  is  lower-bounded  by  1  —  {qiq2)/qs  —  q%/^n-  (This  is  slightly 
simpler  than  the  real  bound,  see  Section  3.2.)  What  this  lower  bound  states  is 
that  the  simulator  must  make  very  close  to  minjgi^,  2n/2}  queries  to  prevent 
this  distinguisher’s  success.  The  result  extends  to  structured  underlying  hash 
functions  H  as  well,  for  example  should  H  be  MD-based. 

To  the  best  of  our  knowledge,  our  results  are  the  first  to  show  lower  bounds  on 
the  number  of  queries  an  indifferentiability  simulator  must  use.  That  a  simulator 
must  make  a  large  number  of  queries  hinders  the  utility  of  indifferentiability. 
When  one  uses  the  MRH  composition  theorem,  the  security  of  a  scheme  when 
using  a  monolothic  RO  must  hold  up  to  the  number  of  queries  the  simulator 
makes.  For  example,  in  settings  where  one  uses  a  hash  function  needing  to  be 
collision-resistant  and  attempts  to  conclude  security  via  some  (hypothetical) 
indifferentiability  bound,  our  results  indicate  that  the  resulting  security  bound 
for  the  application  can  be  at  most  2n/4  instead  of  the  expected  2n/2. 

Upper  bounds  for  second  iterates.  We  have  ruled  out  good  upper  bounds 
on  indifferentiability,  but  the  question  remains  whether  weak  bounds  exist.  We 
provide  proofs  of  indifferentiability  for  H 2  that  hold  up  to  about  2n/4  distin- 

3 


15.  To  Hash  or  Not  to  Hash  Again? 


guisher  queries  (our  lower  bounds  rule  out  doing  better)  when  H  is  a  RO.  We 
provide  some  brief  intuition  about  the  proof.  Consider  an  indifferentiability  ad¬ 
versary  making  at  most  91,92  queries.  The  adversarial  strategy  of  import  is  to 
compute  long  chains  using  the  left  oracle,  and  then  try  to  “catch”  the  simulator 
in  an  inconsistency  by  querying  it  on  a  value  at  the  end  of  the  chain  and,  after¬ 
wards,  filling  in  the  intermediate  values  via  further  left  and  right  queries.  But 
the  simulator  can  avoid  being  caught  if  it  prepares  long  chains  itself  to  help  it 
answer  queries  consistently.  Intuitively,  as  long  as  the  simulator’s  chains  are  a  bit 
longer  than  91  hops,  then  the  adversary  cannot  build  a  longer  chain  itself  (being 
restricted  to  at  most  91  queries)  and  will  never  win.  The  full  proofs  of  these 
results  are  quite  involved,  and  so  we  defer  more  discussion  until  the  body.  We 
are  unaware  of  any  indifferentiability  proofs  that  requires  this  kind  of  nuanced 
strategy  by  the  simulator. 


1.2  HMAC  with  Arbitrary  Keys 

HMAC  was  introduced  by  Bellare,  Canetti,  and  Krawczyk  [5]  to  be  used  as 
a  pseudorandom  function  or  message  authentication  code.  It  uses  an  underly¬ 
ing  hash  function  H;  let  H  have  block  size  d  bits  and  output  length  n  bits. 
Computing  a  hash  HMAC {K,  M)  works  as  follows  [26].  If  \K\  >  d  then  rede¬ 
fine  K  <—  H{K).  Let  K'  be  K  padded  with  sufficiently  many  zeros  to  get  a  d 
bit  string.  Then  HMAC(A',  M)  =  H{K'  ®  opad  ||  H(K'  ©  ipad  ||  M))  where  opad 
and  ipad  are  distinct  d-bit  constants.  The  original  (provable  security)  analyses  of 
HMAC  focus  on  the  setting  that  the  key  K  is  honestly  generated  and  secret  [3, 5]. 
But  what  has  happened  is  that  HMAC’s  speed,  ubiquity,  and  assumed  security 
properties  have  lead  it  to  be  used  in  a  wide  variety  of  settings. 

Of  particular  relevance  are  settings  in  which  existing  (or  potential)  proofs  of 
security  model  HMAC  as  a  keyed  RO,  a  function  that  maps  each  key,  message 
pair  to  an  independent  and  uniform  point.  There  are  many  examples  of  such 
settings.  The  HKDF  scheme  builds  from  HMAC  a  general-purpose  key  derivation 
function  [27, 28]  that  uses  as  key  a  public,  uniformly  chosen  salt.  When  used  with 
a  source  of  sufficiently  high  entropy,  Krawczyk  proves  security  using  standard 
model  techniques,  but  when  not  proves  security  assuming  HMAC  is  a  keyed 
RO  [28].  PKCS#5  standardizes  password-based  key  derivation  functions  that 
use  HMAC  with  key  being  a  (low-entropy)  password  [30] .  Recent  work  provides 
the  first  proofs  of  security  when  modeling  HMAC  as  a  RO  [9].  Ristenpart  and 
Yilek  [32],  in  the  context  of  hedged  cryptography  [4],  use  HMAC  in  a  setting 
whose  cryptographic  security  models  allow  adversarially  specified  keys.  Again, 
proofs  model  HMAC  as  a  keyed  RO. 

As  mentioned  previously,  we  would  expect  a  priori  that  one  can  show  that 
HMAC  is  indifferentiable  from  a  keyed  RO  even  when  the  attacker  can  query 
arbitrary  keys.  Then  one  could  apply  the  composition  theorem  of  [29]  to  derive 
proofs  of  security  for  the  settings  just  discussed. 

Weak  key  pairs  in  HMAC.  We  are  the  first  to  observe  that  HMAC  has  weak 
key  pairs.  First,  there  exist  K  ^  K'  for  which  HMAC  (A',  M)  =  HMAC(AT',  M). 
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These  pairs  of  keys  arise  because  of  HMAC’s  ambiguous  encoding  of  differing- 
length  keys.  Trivial  examples  of  such  “colliding”  keys  include  any  K,  K'  for  which 
either  \K\  <  d  and  K'  =  K  ||  0s  (for  any  1  <  s  <  d  —  \K\),  or  \K\  >  d  and  K'  = 
H(K).  Colliding  keys  enable  an  easy  attack  that  distinguishes  HMAC(-,  •)  from  a 
random  function  1Z(-,  •),  which  also  violates  the  indifferentiability  of  HMAC.  On 
the  other  hand,  as  long  as  H  is  collision-resistant,  two  keys  of  the  same  length 
can  never  collide.  Still,  even  if  we  restrict  attention  to  (non-colliding)  keys  of 
a  fixed  length,  there  still  exist  weak  key  pairs,  but  of  a  different  form  that  we 
term  ambiguous.  An  example  of  an  ambiguous  key  pair  is  K,  I\'  of  length  d  bits 
such  that  K  ©  ipad  =  K'  ©opad.  Because  the  second  least  significant  bit  of  ipad 
and  opad  differ  (see  Section  4)  and  assuming  d  >  n  —  2,  ambiguous  key  pairs  of 
a  fixed  length  k  only  exist  for  k  £  {d  —  l,d}.  The  existence  of  ambiguous  key 
pairs  in  HMAC  leads  to  negative  results  like  those  given  for  H2.  In  particular, 
we  straightforwardly  extend  the  H 2  distinguisher  to  give  one  that  lower  bounds 
the  number  of  queries  any  indifferentiability  simulator  must  make  for  HMAC. 

Upper  BOUNDS  FOR  HMAC.  Fortunately,  it  would  seem  that  weak  key  pairs  do 
not  arise  in  typical  applications.  Using  HMAC  with  keys  of  some  fixed  bit  length 
smaller  than  d—  1  avoids  weak  key  pairs.  This  holds  for  several  applications,  for 
example  the  recommendation  with  HKDF  is  to  use  ?r-bit  uniformly  chosen  salts 
as  HMAC  keys.  This  motivates  finding  positive  results  for  HMAC  when  one 
avoids  the  corner  cases  that  allow  attackers  to  exploit  weak  key  pairs. 

Indeed,  as  our  main  positive  result,  we  prove  that,  should  H  be  a  RO  or 
an  MD  hash  with  ideal  compression  functions,  HMAC  is  indifferentiable  from 
a  keyed  RO  for  all  distinguishers  that  do  not  query  weak  key  pairs.  Our  result 
holds  for  the  case  that  the  keys  queried  are  of  length  d  or  less.  This  upper  bound 
enjoys  the  best,  birthday-bound  level  of  concrete  security  possible  (up  to  small 
constants),  and  provides  the  first  positive  result  about  the  indifferentiability  of 
the  HMAC  construction. 

1.3  Discussion 

The  structural  properties  within  H 2  and  HMAC  are,  in  theory,  straightforward 
to  avoid.  Indeed,  as  mentioned  above,  Coron  et  al.  [13]  prove  indifferentiable 
from  a  RO  the  construction  H2(0d  ||  M)  where  H  is  MD  using  a  compression 
function  with  block  size  d  bits  and  chaining  value  length  n  <  d  bits.  Analogously, 
our  positive  results  about  HMAC  imply  as  a  special  case  that  HMAC 
for  any  fixed  constant  K .  is  indifferentiable  from  a  RO. 

We  emphasize  that  we  are  unaware  of  any  deployed  cryptographic  applica¬ 
tion  for  which  the  use  of  H2  or  HMAC  leads  to  a  vulnerability.  Still,  our  results 
show  that  future  applications  should,  in  particular,  be  careful  when  using  HMAC 
with  keys  which  are  under  partial  control  of  the  attacker.  More  importantly,  our 
results  demonstrate  the  importance  of  provable  security  in  the  design  of  hash 
functions  (and  elsewhere  in  cryptography),  as  opposed  to  the  more  common 
“attack-fix”  cycle.  For  example,  the  hash-of-hash  suggestion  of  Ferguson  and 
Schneier  [20]  was  motivated  by  preventing  the  extension  attack.  Unfortunately, 
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in  so  doing  they  accidentally  introduced  a  more  subtle  (although  less  danger¬ 
ous)  attack,  which  was  not  present  on  the  original  design.5  Indeed,  we  discov¬ 
ered  the  subtlety  of  the  problems  within  H 2  and  HMAC,  including  our  explicit 
attacks,  only  after  attempting  to  prove  indifferentiability  of  these  constructions 
(with  typical,  good  bounds).  In  contrast,  the  existing  indifferentiability  proofs  of 
(seemingly)  small  modifications  of  these  hash  functions,  such  as  H2(0d  ||  M)  [13], 
provably  rule  out  these  attacks. 


1.4  Prior  Work 

There  exists  a  large  body  of  work  showing  hash  functions  are  indifferentiable 
from  a  RO  (c.f.,  [1,  7, 11  -13, 15, 16,23]),  including  analyses  of  variants  of  H2  and 
HMAC.  As  mentioned,  a  construction  called  HMAC  was  analyzed  in  [13]  but 
this  construction  is  not  HMAC  as  standardized.  Krawczyk  [28]  suggests  that  the 
analysis  of  H2( 0  ||  M)  extends  to  the  case  of  HMAC,  but  does  not  offer  proof.6 
HMAC  has  received  much  analysis  in  other  contexts.  Proofs  of  its  security  as 
a  pseudorandom  function  under  reasonable  assumptions  appear  in  [3,5].  These 
rely  on  keys  being  uniform  and  secret,  making  the  analyses  inapplicable  for 
other  settings.  Analysis  of  HMAC’s  security  as  a  randomness  extractor  appear 
in  [14,21],  These  results  provide  strong  information  theoretic  guarantees  that 
HMAC  can  be  used  as  a  key  derivation  function,  but  only  in  settings  where  the 
source  has  a  relatively  large  amount  of  nrin-entropy.  This  requirement  makes  the 
analyses  insufficient  to  argue  security  in  many  settings  of  practical  importance. 
See  [28]  for  further  discussion. 

Full  version.  Due  to  space  constraints,  many  of  our  technical  results  and 
proofs  are  deferred  to  the  full  version  of  this  paper  [17]. 

2  Preliminaries 

Notation  AND  games.  We  denote  the  empty  string  by  A.  If  |A|  <  |y|  then 
X  ®  Y  signifies  that  the  X  is  padded  with  |F|  —  |X|  zeros  first.  For  set  X  and 
value  x,  we  write  A  x  to  denote  A  e-  AU{a;}.  For  non-empty  sets  Keys,  Dom, 
and  Rng  with  |Jtng|  finite,  a  random  oracle  /  :  Keys  x  Dom  — >  Rng  is  a  function 
taken  randomly  from  the  space  of  all  possible  functions  Keys  x  Dom  — >  Rng. 
We  will  sometimes  refer  to  random  oracles  as  keyed  when  Keys  is  non-empty, 
whereas  we  omit  the  first  parameter  when  Keys  =  0. 

We  use  code-based  games  [10]  to  formalize  security  notions  and  within  our 
proofs.  In  the  execution  of  a  game  G  with  adversary  A,  we  denote  by  GA  the 
event  that  the  game  outputs  true  and  by  AG  y  the  event  that  the  adversary 

5  We  note  the  prescience  of  the  proposers  of  H2,  who  themselves  suggested  further 
analysis  was  needed  [20]. 

5  Fortunately,  the  HKDF  application  of  [28]  seems  to  avoid  weak  key  pairs,  and  thus 
our  positive  results  for  HMAC  appear  to  validate  this  claim  [28]  for  this  particular 
application. 
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outputs  y.  Fixing  some  RAM  model  of  computation,  our  convention  is  that  the 
running  time  Time(M)  of  an  algorithm  A  includes  its  code  size.  Queries  are  unit 
cost,  and  we  will  restrict  attention  to  the  absolute  worst  case  running  time  which 
must  hold  regardless  of  queries  are  answered. 

Hash  functions.  A  hash  function  H[P] :  Keys  x  Dom  — >  Rng  is  is  a  fam¬ 
ily  of  functions  from  Dom  to  Rng ,  indexed  by  a  set  Keys,  that  possibly  uses 
(black-box)  access  to  an  underlying  primitive  P  (e.g.,  a  compression  function). 
We  call  the  hash  function  keyed  if  Keys  is  non-empty,  and  key-less  otherwise. 
(In  the  latter  case,  we  omit  the  first  parameter.)  We  assume  that  the  number 
of  applications  of  P  in  computing  H[P](K.  M)  is  the  same  for  all  K,M  with 
the  same  value  of  \K\  +  \M\.  This  allows  us  to  define  the  cost  of  computing  a 
hash  function  H[P\  on  a  key  and  message  whose  combined  length  is  £,  denoted 
Cost (H,£),  as  the  number  of  calls  to  P  required  to  compute  H\P]{K,M)  for 
I\,  M  with  \K\  +  \M\  =  i.  For  a  keyed  random  oracle  1Z :  Keys  x  Dom  Rng, 
we  fix  the  convention  that  Cost(7?.,  £)  =  1  for  any  l  for  which  there  exists  a  key 
K  £  Keys  and  message  M  £  Dom  such  that  \K\  +  \M\  =  £. 

A  compression  function  is  a  hash  function  for  which  Dom  =  (0,  l}n  x  {0,  l}d 
and  Rng  =  {0, 1}"  for  some  numbers  n,  d  >  0.  Our  focus  will  be  on  keyless 
compression  functions,  meaning  those  of  the  form  /  :  {0, 1}"  x  (0,  \}d  —>  {0, 1}". 
Our  results  lift  in  a  straightforward  way  to  the  dedicated-key  setting  [8].  The 
t- th  iterate  of  H[P\  is  denoted  He[P],  and  defined  for  £  >  0  by  He[P](X)  = 
H[P](H[P}(-  ■  ■  H[P](X))  ■  ■  ■ )  where  the  number  of  applications  of  H  is  £.  We 
let  H°[P](X)  =  X.  We  will  often  write  H  instead  of  H[P\  when  the  underlying 
primitive  P  is  clear  or  unimportant. 

Merkle-Damgard.  Let  Pad  :  (0, 1}-L  — >  ({0, 1}")+  be  an  injective  padding 
function.  The  one  used  in  many  of  the  hash  functions  within  the  SHA  family 
outputs  M  ||  10r  ||  (|M|)64  where  (|x|)64  is  the  encoding  of  the  length  of  M  as 
a  64-bit  string  and  r  is  the  smallest  number  making  the  length  a  multiple  of  d. 
This  makes  L  =  264  —  1.  The  function  MD[/] :  ({0,  l}n)+  —>  {0,  l}ra  is  defined 
as 

MD[/](M)  =  /(/(•  •  •  f(f(IV,  Mi),  M2),  •  •  • ),  Mfc) 

where  \M\  =  kd  and  M\  ||  •  •  •  ||  Mfc.  The  function  SMD[/] :  (0, 1}-L  — >•  (0, 1}" 
is  defined  as  SMD[/](M)  =  MD[/](Pad(M)). 

Indifferentiability  from  a  RO.  Let  1Z :  Keys  x  Dom  — »•  Rng  be  a  random 
oracle.  Consider  a  hash  construction  H[P] :  Keys  x  Dom  — >  Rng  from  an  ideal 
primitive  P.  Let  game  Real#[p]  be  the  game  whose  main  procedure  runs  an 
adversary  AFunc,Prim  and  returns  the  bit  that  A  outputs.  The  procedure  Func  on 
input  K  £  Keys  and  M  £  Dom  returns  H[P](K,  M).  The  procedure  Prim  on 
input  X  returns  P(X).  For  a  simulator  S,  let  game  Idealp.s  be  the  game  whose 
main  procedure  runs  an  adversary  v4Func-Pnm  and  returns  the  bit  that  A  outputs. 
The  procedure  Func  on  input  K  £  Keys  and  M  £  Dom  returns  1Z{K,M).  The 
procedure  Prim  on  input  X  returns  Sn(X).  The  indifferentiability  advantage 
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of  V  is  defined  as 


Adv 


indiff 

H[p],n,s 


(' V )  =  Pr 


Real 


T> 


h[p]  y 


-Pr 


Ideal 


T> 


■n,s  ^  V 


We  focus  on  simulators  that  must  work  for  any  adversary,  though  our  negative 
results  extend  as  well  to  the  weaker  setting  in  which  the  simulator  can  depend 
on  the  adversary.  The  total  query  cost  a  of  an  adversary  T>  is  the  cumulative 
cost  of  all  its  Func  queries  plus  q-2-  (This  makes  er  the  total  number  of  P  uses 
in  game  Realp[pj .  In  line  with  our  worst-case  conventions,  this  means  the  same 
maximums  hold  in  Idealp.,5  although  here  it  does  not  translate  to  P  applica¬ 
tions.) 

We  note  that  when  Keys  is  non-empty,  indifferentiability  here  follows  [8]  and 
allows  the  distinguisher  to  choose  keys  during  an  attack.  This  reflects  the  desire 
for  a  keyed  hash  function  to  be  indistinguishable  from  a  keyed  random  oracle 
for  arbitrary  uses  of  the  key  input. 


3  Second  Iterates  and  their  Security 

Our  investigation  begins  with  the  second  iterate  of  a  hash  function,  meaning 
H2(M )  =  where  H :  Dom  — >  Rng  for  sets  Dom  D  Rng.  For  simplic¬ 

ity,  let  Rng  =  {0, 1}"  and  assume  that  H  is  itself  modeled  as  a  RO.  Is  H2  good 
in  the  sense  of  being  like  a  RO?  Given  that  we  are  modeling  H  as  a  RO,  we 
would  expect  that  the  answer  would  be  “yes”.  The  truth  is  more  involved.  As 
we’ll  see  in  Section  4,  similar  subtleties  exist  in  the  case  of  the  related  HMAC 
construction. 

We  start  with  the  following  observations.  When  computing  H2(M)  for  some 
M,  we  refer  to  the  value  H(M )  as  an  intermediate  value.  Then,  we  note  that  the 
value  Y  =  H2(M)  is  in  fact  the  intermediate  value  used  when  computing  H2(X) 
for  X  =  H{M).  Given  Y  =  H2(M),  then,  one  can  compute  H2(H(M))  directly 
by  computing  H(Y).  That  the  hash  value  Y  is  also  the  intermediate  value  used  in 
computing  the  hash  of  another  message  is  cause  for  concern:  other  hash  function 
constructions  that  are  indifferentiable  from  a  RO  (c.f.,  [2,7,8, 13,23])  explicitly 
attempt  to  ensure  that  outputs  are  not  intermediate  values  (with  overwhelming 
probability  over  the  randomness  of  the  underlying  idealized  primitive).  Moreover, 
prior  constructions  for  which  hash  values  are  intermediate  values  have  been 
shown  to  not  be  indifferentiable  from  a  RO.  For  example  Merkle-Damgard-based 
iterative  hashes  fall  to  extension  attacks  [13]  for  this  reason.  Unlike  with  Merkle- 
Damgard,  however,  it  is  not  immediately  clear  how  an  attacker  might  abuse  the 
structure  of  H2. 

We  turn  our  attention  to  hash  chains,  where  potential  issues  arise.  For  a  hash 
function  H,  we  define  a  hash  chain  Y  =  (To,  •  •  • ,  Ye)  to  be  a  sequence  of  l  +  1 
values  where  Yq  is  a  message  and  1)  =  ff(Y,;_i)  for  1  <  i  <  l.  Likewise  when 
using  H2  a  hash  chain  Y  =  (Yo, . . . ,  Yr)  is  a  sequence  of  t  +  1  values  where  Y0 
is  a  message  and  Y)  =  ff2(Yj_i)  for  1  <  i  <  i.  We  refer  to  Yo  as  the  staH  of  the 
hash  chain  and  Ye  as  the  end.  Two  chains  Y.  Y'  are  non- overlapping  if  no  value 
in  one  chain  occurs  in  the  other,  meaning  Yi  ^  Yj  for  all  0  <  i  <  j  <  t . 
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H*(Y0) 

_ I _ 


H  — «-K  ■ 


H  — *■  Yi  ■■■  Ye-i  ■ 


H  — *-Y’t_  i- 


\ 


H2‘(Y{) 


Fig.  1.  Diagram  of  two  hash  chains  Y  =  (Yo,  •  •  • ,  Ye)  and  Y'  =  (Yq', . . . ,  Yf)  for 
hash  function  H2 . 


For  any  hash  function  and  given  the  start  and  end  of  a  hash  chain  Y  = 
(Yo,  •  •  • ,  Ye),  one  can  readily  compute  the  start  and  end  of  a  new  chain  with  just 
two  hash  calculations.  That  is,  set  Yq  •<—  H(Yq)  and  Yf  H(Ye).  However,  the 
chain  Y'  =  (Yq,  . . . ,  Yf)  and  the  chain  Y  overlap.  For  good  hash  functions  (i.e., 
ones  that  behave  like  a  RO)  computing  the  start  and  end  of  a  non-overlapping 
chain  given  the  start  and  end  of  a  chain  Yo,  Ye  requires  at  least  t  hash  computa¬ 
tions  (assuming  l  <C  2"/2). 

Now  consider  H2.  Given  the  start  and  end  of  a  chain  Y  =  (Yq,  . . .  ,Yf), 
one  can  readily  compute  a  non-overlapping  chain  Y'  =  (Yq,  . . . ,  Yf)  using  just 
two  hash  computations  instead  of  the  expected  2i  computations.  Namely,  let 
Yq'  •<—  H(Yq)  and  Yf  H(Ye).  Then  these  are  the  start  and  end  of  the  chain 
Y'  =  (Yq  , . . . ,  Y'f)  because 

H2t(Y)  =  H21(H(Yq))  =  H(Hn(Y0)) 

which  we  call  the  chain-shift  property  of  H 2 .  Moreover,  assuming  H  is  itself  a 
RO  outputing  n-bit  strings,  the  two  chains  Y,  Y'  do  not  overlap  with  probability 
at  least  1  —  (2£  +  2)2/2n.  Figure  1  provides  a  pictoral  diagram  of  the  two  chains 
Y  and  Y' . 


3.1  A  Vulnerable  Application:  Mutual  Proofs  of  Work 

In  the  last  section  we  saw  that  the  second  iterate  fails  to  behave  like  a  RO  in 
the  context  of  hash  chains.  But  the  security  game  detailed  in  the  last  section 
may  seem  far  removed  from  real  protocols.  For  example,  it’s  not  clear  where 
an  attacker  would  be  tasked  with  computing  hash  chains  in  a  setting  where  it, 
too,  was  given  an  example  hash  chain.  We  suggest  that  just  such  a  setting  could 
arise  in  protocols  in  which  parties  want  to  assert  to  each  other,  in  a  verifiable 
way,  that  they  performed  some  amount  of  computation.  Such  a  setting  could 
arise  when  parties  must  (provably)  compare  assertions  of  computational  power, 
as  when  using  cryptographic  puzzles  [18,19,24,25,33,35].  Or  this  might  work 
when  trying  to  verifiably  calibrate  differing  computational  speeds  of  the  two 
parties’  computers.  We  refer  to  this  task  as  a  mutual  proof  of  work. 

Mutual  PROOFS-OF-WORK.  For  the  sake  of  brevity,  we  present  an  example 
hash-chain-based  protocol  and  dispense  with  a  more  general  treatment  of  mu¬ 
tual  proofs  of  work.  Consider  the  two-party  protocol  shown  in  the  left  diagram 
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Vl) 

Vl 

<-  {irqPKXi)  10 
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(V/  =  Y1)  A 

(Vi  n  y2  =  ®) 

(Vi  n  S>2  =  8) 

Fig.  2.  Example  protocol  (left)  and  adversarial  V 2  security  game  (right)  for 
mutual  proofs  of  work. 


of  Figure  2.  Each  party  initially  chooses  a  random  nonce  and  sends  it  to  the 
other.  Then,  each  party  computes  a  hash  chain  of  some  length  — chosen  by  the 
computing  party —  starting  with  the  nonce  chosen  by  the  other  party,  and  sends 
the  chain’s  output  along  with  the  chain’s  length  to  the  other  party.  At  this  point, 
both  parties  have  given  a  witness  that  they  performed  a  certain  amount  of  work. 
So  now,  each  party  checks  the  other’s  asserted  computation,  determining  if  the 
received  value  is  the  value  resulting  from  chaining  together  the  indicated  number 
of  hash  applications  and  checking  that  the  hash  chains  used  by  each  party  are 
non-overlapping.  Note  that  unlike  puzzles,  which  require  fast  verification,  here 
the  verification  step  is  as  costly  as  puzzle  solution. 

The  goal  of  the  protocol  is  to  ensure  that  the  other  party  did  compute  exactly 
their  declared  number  of  iterations.  Slight  changes  to  the  protocol  would  lead 
to  easy  ways  of  cheating.  For  example,  if  during  verification  the  parties  did  not 
check  that  the  chains  are  non-overlapping,  then  V2  can  easily  cheat  by  choosing 
Xi  so  that  it  can  reuse  a  portion  of  the  chain  computed  by  V\ 


Security  would  be  achieved  should  no  cheating  party  succeed  at  convincing 
an  honest  party  using  less  than  t\  (resp.  1 2 )  work  to  compute  Yi  (resp.  Y2).  The 
game  POW H[p],n,i-i  formalizes  this  security  goal  for  a  cheating  V2;  see  the  right 


portion  of  Figure  2.  We  let  Adv^pj  n  ^  (A)  =  Pr 


POWf,[p],l){l 


.  Note  that 


the  adversary  A  only  wins  should  it  make  q  <  i2-  Cost {H1  n)  queries,  where  i2 
is  the  value  it  declared  and  Cost (H)  is  the  cost  of  computing  H .  Again  we  will 
consider  both  the  hash  function  H[P]{M)  =  P{M)  that  just  applies  a  RO  P 
and  also  H2[P\(M)  =  P(P(M)),  the  second  iterate  of  a  RO.  In  the  former  case 
the  can  make  only  i2  —  1  queries  and  in  the  latter  case  2£2  —  1. 


When  H[P\(M)  =  P(M),  no  adversary  making  q  <  i2  queries  to  Prim  can 
win  the  POW^qpj.n/i  game  with  high  advantage.  Intuitively,  the  reason  is  that, 
despite  being  given  X\  and  Y\  where  Y\  =  P£l(X  1),  a  successful  attacker  must 
still  compute  a  full  f^-length  chain  and  this  requires  i2  calls  to  P.  A  more  formal 
treatment  appears  in  the  full  version. 
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Attack  against  any  second  iterate.  Now  let  us  analyze  this  protocol’s 
security  when  we  use  as  hash  function  H2[P }  =  P(P(M))  for  a  RO  P :  Dom  — > 
Rng  with  Rng  C  Dom.  We  can  abuse  the  chain-shift  property  of  H 2  in  order  to 
win  the  POW H2,p.nM  game  for  any  n  >  0  and  £\  >  2.  Our  adversary  A  works 
as  follows.  It  receives  X2  and  then  chooses  it’s  nonce  as  X\  Prim(X2).  When 
it  later  receives  Y-\  =  P2£l(X i),  the  adversary  proceeds  by  setting  i2  =  £i  +  1 
and  setting  Y2  t—  Prim(Yi).  Then  by  the  chain-shift  property  we  have  that 

Y2  =  P(Yi)  =  P(PU'(X  1))  =  P{P2e'{P(X2)))  =  PU*{X2)  . 

The  two  chains  will  be  non-overlapping  with  high  probability  (over  the  coins 
used  by  P).  Finally,  A  makes  only  2  queries  to  Prim,  so  the  requirement  that 
q  <  2i2  is  met  whenever  t\  >  1. 

DISCUSSION.  As  far  as  we  are  aware,  mutual  proofs  of  work  have  not  before 
been  considered  -  the  concept  may  indeed  be  of  independent  interest.  A  full 
treatment  is  beyond  the  scope  of  this  work.  We  also  note  that,  of  course,  it  is  easy 
to  modify  the  protocols  using  H2  to  be  secure.  Providing  secure  constructions 
was  not  our  goal,  rather  we  wanted  to  show  protocols  which  are  insecure  using 
H2  but  secure  when  H2  is  replaced  by  a  monolothic  RO.  This  illustrates  how, 
hypothetically,  the  structure  of  H2  could  give  rise  to  subtle  vulnerabilities  in  an 
application. 


3.2  Indifferentiability  Lower  and  Upper  Bounds 

In  this  section  we  prove  that  any  indifferentiability  proof  for  the  double  iterate 
H2  is  subject  to  inherent  quantitative  limitations.  Recall  that  indifferentiability 
asks  for  a  simulator  S  such  that  no  adversary  can  distinguish  between  the  pair  of 
oracles  H2  [P] ,  P  and  1Z,  S  where  P  is  some  underlying  ideal  primitive  and  P  is  a 
RO  with  the  same  domain  and  range  as  H 2 .  The  simulator  can  make  queries  to 
1Z  to  help  it  in  its  simulation  of  P.  Concretely,  building  on  the  ideas  behind  the 
above  attacks  in  the  context  of  hash  chains,  we  show  that  in  order  to  withstand 
a  differentiating  attack  with  q  queries,  any  simulator  for  H2[P],  for  any  hash 
construction  H[P)  with  output  length  n,  must  issue  at  least  I7(min{g2, 2”/2}) 
queries  to  the  RO  7Z.  As  we  explain  below,  such  a  lower  bound  severely  limits  the 
concrete  security  level  which  can  be  inferred  by  using  the  composition  theorem 
for  indifferentiability,  effectively  neutralizing  the  benefits  of  using  indifferentia¬ 
bility  in  the  first  place. 

The  DISTINGUISHER.  In  the  following,  we  let  H  =  H[P]  be  an  arbitrary  hash 
function  with  n-bit  outputs  relying  on  a  primitive  P,  such  as  a  fixed  input-length 
random  oracle  or  an  ideal  cipher.  We  are  therefore  addressing  an  arbitrary  second 
iterate,  and  not  focusing  on  some  particular  ideal  primitive  P  (such  as  a  RO  as 
in  previous  sections)  or  construction  H.  Indeed,  H  could  equally  well  be  Merkle- 
Damgard  and  P  an  ideal  compression  function,  or  H  could  be  any  number  of 
indifferentiable  hash  constructions  using  appropriate  ideal  primitive  P. 

Recall  that  Func  and  Prim  are  the  oracles  associated  with  construction  and 
primitive  queries  to  H2  =  H2[P]  and  P,  respectively.  Let  w,£  be  parameters  (for 
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now,  think  for  convenience  of  w  =  £).  The  attacker  'Du,j_  starts  by  issuing  £  queries 
to  Func  to  compute  a  chain  of  n-bit  values  (xq,  Xi,  . . . ,  Xf)  where  Xi  =  H2{xi- 1) 
and  xo  is  a  random  n-bit  string.  Then,  it  also  picks  a  random  index  j  £  [1 ..  w], 
and  creates  a  list  of  n-bit  strings  u[l], . . . ,  u[w]  with  u[j]  =  x^,  and  all  remaining 
u[i]  for  i  ^  j  are  chosen  uniformly  and  independently.  Then,  for  all  i  £  [1 ..  to], 
the  distinguisher  T>v.,e  proceeds  in  asking  all  Prim  queries  in  order  to  compute 
v[i]  =  i?(u[t]).  Subsequently,  the  attacker  compute  yo  =  H{x o)  via  Prim  queries, 
and  also  computes  the  chain  (yo,y\, . . . ,  ye)  such  that  y,;  =  by  making 

i  Func  queries.  Finally,  it  decides  to  output  1  if  and  only  if  ye  =  v[j]  and  xi 
as  well  as  v[i]  for  i  ^  j  are  not  in  {yo,yi, . . .  ,ye}.  The  attacker  Dw^  therefore 
issues  a  total  of  2£  Func  queries  and  (2 w  +  1)  •  Cost (H,n)  Prim  queries. 

In  the  real-world  experiment,  the  distinguisher  T>Wje  outputs  1  with  very 
high  probability,  as  the  condition  ye  =  v[j]  always  holds  by  the  chain- shifting 
property  of  H2.  In  fact,  the  only  reason  for  V  outputting  0  is  that  one  of  xe  and 
v[i]  for  i  j  incidentally  happens  to  be  in  {yo,  J/i,  •  •  • ,  ye}-  The  (typically  small) 
probability  that  this  occurs  obviously  depends  on  the  particular  construction 
H[P]  at  hand;  it  is  thus  convenient  to  define  the  shorthand 

p(H,w,£)  =  Pr[{xe,  H(Ui), . . .  ,H(Uw-i)}  r  {y0,yi,  -  -  -  ,ye}  ±  0]  , 
where  Xo,  yo,  Xi,  •  •  • ,  ye-i,  xe,  ye  are  the  intermediate  value  of  a  chain  of  21  + 
1  consecutive  evaluations  of  H[P]  starting  at  a  random  n-bit  string  xo,  and 
C7 1 , . . . ,  Uw-i  are  further  independent  random  ?r-bit  values.  In  the  full  version  of 
this  paper  we  prove  that  for  H[P ]  =  P  =  1Z  for  a  random  oracle  1Z  :  {0, 1}*  — » 
{0, 1}”  we  have  p(H,w,£)  =  0((w£  +  £2)/2n).  Similar  reasoning  can  be  applied 
to  essentially  all  relevant  constructions. 

In  contrast,  in  the  ideal-world  experiment,  we  expect  the  simulator  to  be 
completely  ignorant  about  the  choice  of  j  as  long  as  it  does  not  learn  xo,  and 
in  particular  it  does  not  know  j  while  answering  the  Prim  queries  associated 
with  the  evaluations  of  ff(u[i]).  Consequently,  the  condition  required  for  T>wj 
to  output  1  appears  to  force  the  simulator,  for  all  i  £  [1 ..  w\ .  to  prepare  a  distinct 
chain  of  £  consecutive  1Z  evaluations  ending  in  v  [i] ,  hence  requiring  w  ■  £  random 
oracle  queries. 

The  following  theorem  quantifies  the  advantage  achieved  by  the  above  distin¬ 
guisher  T>w,e  in  differentiating  against  any  simulator  for  the  construction  H[P ]. 
Its  proof  is  given  in  the  full  version. 


Theorem  1.  [Attack  against  H2]  Let  H[P ]  he  an  arbitrary  hash  construction 
with  n-bit  outputs,  calling  a  primitive  P,  and  let  1Z  :  (0, 1}*  — >•  {0, 1}"  be  a 
random  oracle.  For  all  integer  parameters  w,£  >  1,  there  exists  an  adversary 
T>w,e  making  21  Func-queries  and  (w  +  1)  •  Cost(H ,  n)  Prim-queries  such  that  for 
all  simulators  S, 


Adv‘^2 j®,  n  s  (Vw te)  >  l-p(H,w,£)  -  ^  g  - 


5l2  _ds£_  _g |_  _  ds  _  j_ 

2n+1  2n  2n  w  ■£  w  ’ 
where  qs  is  the  overall  number  of  1Z  queries  by  S  when  replying  to  T>w/  ’s  Prim 
queries.  E 
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Discussion.  We  now  elaborate  on  Theorem  1.  If  we  consider  the  distinguislrer 
'Du,  f  from  Theorem  1,  we  observe  that  by  the  advantage  lower  bound  in  the 
theorem  statement,  if  £,w  <C  2n/4  and  consequently  p(H,  w,£)  «  0,  the  num¬ 
ber  of  queries  made  by  the  simulator,  denoted  qs  =  qs(2£,ut  +  1)  must  satisfy 
qs  =  f2(w  ■  £)  =  f2(qi  ■  q2)  to  ensure  a  sufficiently  small  indifferentiability  advan¬ 
tage.  This  in  particular  means  that  in  the  case  where  both  q\  and  (72  are  large, 
the  simulator  must  make  a  quadratic  effort  to  prevent  the  attacker  from  distin¬ 
guishing.  Below,  in  Theorem  2,  we  show  that  this  simulation  effort  is  essentially 
optimal. 

In  many  scenarios,  this  quadratic  lower  bound  happens  to  be  a  problem,  as 
we  now  illustrate.  As  a  concrete  example,  let  SS  =  (key,  sign,  ver)  be  an  arbitrary 

' — TZ  TZ  

signature  scheme  signing  n  bits  messages,  and  let  <S<S[7?.]  =  (key  ,sign  ,ver  ) 

for  TZ  :  {0, 1}*  — »•  {0, 1}"  be  the  scheme  obtained  via  the  hash-then-sign  paradigm 

— -n 

such  that  sign  ( sk,m )  =  sign  (si,  7Z(m)).  It  is  well  known  that  for  an  adversary 
B  making  q5\gn  signing  and  qp  random  oracle  queries,  there  exists  an  adversary 
C  making  gsign  signing  queries  such  that 

Adv^(fl*)  <  (gslgn2t^)2  +  Adv£cma(C)  ,  (1) 

where  Adv|~™la(,BK)  and  Adv55Cma(C)  denote  the  respective  advantages  in 
the  standard  uf-cma  game  for  security  of  signature  schemes  (with  and  without 
a  random  oracle,  respectively).  This  in  particular  means  that  SS  is  secure  for 
(/sign  and  qiz  as  large  as  0( 2n/2),  provided  SS  is  secure  for  qs ;gn  signing  queries. 
However,  let  us  now  replace  TZ  by  H2[P]  for  an  arbitrary  construction  H  =  H[P\. 
Then,  for  all  adversaries  A  making  qp  queries  to  P  and  q5lgn  signing  queries,  we 
can  combine  the  concrete  version  of  the  MRH  composition  theorem  proven  in  [31] 
and  (1)  to  infer  that  there  exists  an  adversary  C  and  a  distinguislrer  T>  such  that 

Advi™^P)  ^  0  (fei5n2ngp)2)  +  Ad v£cma(C)  +  Ad^fP]^s(V)  , 

where  C  makes  (7sign  signing  queries  .  Note  that  even  if  the  term  Adv'/plpi^^  (V) 
is  really  small,  this  new  bound  can  only  ensure  security  for  the  resulting  signature 
scheme  as  long  as  (7sign  •  qp  =  <9( 2”/2),  i.e.,  if  gsign  =  qp,  we  only  get  security  up 
to  0(2n/4)  queries,  a  remarkable  loss  with  respect  to  the  security  bound  in  the 
random  oracle  model. 

We  note  that  of  course  this  does  not  mean  that  H2[P ]  for  a  concrete  H 
and  P  is  unsuitable  for  a  certain  application,  such  as  hash-then-sign.  In  fact, 
H2[P]  may  well  be  optimally  collision  resistant.  However,  our  result  shows  that 
a  sufficiently  strong  security  level  cannot  be  inferred  from  any  indifferentiability 
statement  via  the  composition  theorem,  taking  us  back  to  a  direct  ad-hoc  analysis 
and  completely  loosing  the  one  main  advantage  of  having  indifferentiability  in 
the  first  place. 

Upper  bound.  Our  negative  results  do  not  rule  out  positive  results  completely: 
there  could  be  indifferentiability  upper  bounds,  though  for  simulators  that  make 
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around  0(q2)  queries.  Ideally,  we  would  like  upper  bounds  that  match  closely  the 
lower  bounds  given  in  prior  sections.  We  do  so  for  the  special  case  of  H2[g\{M)  = 
g(g(M))  for  g :  {0, 1}"  — >•  {0, 1}"  being  a  RO. 

Theorem  2.  Let  qi,q2  >  0  and  N  =  2".  Let  g  :  {0,1}"  — >  {0,1}"  and  7 Z  : 
{0, 1}"  — >  {0, 1}"  be  uniform  random  functions.  Then  there  exists  a  simulator  S 
such  that 

indiff  c™  /  2((4gi  +  3)q2  +  2gi)2  2((4gi  +  3)q2  +  2qi)(qi  +  q2) 

AdvG[s])7,j5(P)  <  -  +  (N_2q2-2q1) 

for  any  adversary  V  making  at  most  q±  queries  to  its  left  oracle  and  at  most  q2 
queries  to  its  right  oracle.  Moreover,  for  each  query  answer  that  it  computes,  S 
makes  at  most  3q±  +  1  queries  to  RO  and  runs  in  time  0(qi).  □ 

The  proof  of  the  theorem  appears  in  the  full  version  of  the  paper.  We  note  that 
the  simulator  used  must  know  the  maximum  number  of  queries  the  attacker  will 
make,  but  does  not  otherwise  depend  on  the  adversary’s  strategy.  The  security 
bound  of  the  theorem  is  approximately  (qiq2)2 / N ,  implying  that  security  holds 
up  to  qiq2  ss  2"/2. 

4  HMAC  as  a  General-purpose  Keyed  Hash  Function 

HMAC  [5]  uses  a  hash  function  to  build  a  keyed  hash  function,  i.e.  one  that 
takes  both  a  key  and  message  as  input.  Fix  some  hash  function7  H :  {0, 1}*  — > 
{0,1}".  HMAC  assumes  this  function  H  is  built  by  iterating  an  underlying 
compression  function  with  a  message  block  size  of  d  >  n  bits.  We  define  the 
following  functions: 

Fk(M)  =  H((p(K)  ®  ipad)  ||  M)  f  H(K)  if  \K\  >  d 

where  p(K)  =  < 

Gk(M)  =  H((p(K)  ©  opad)  ||  M )  {  K  otherwise. 

The  two  constants  used  are  ipad  =  0x36d^8  and  opad  =  0x5cd^8.  These  constants 
are  given  in  hexadecimal,  translating  to  binary  gives  0x36  =  OOIIOIIO2  and 
0x5c  =  0101 IIOO2.  Recall  that  we  have  defined  the  ©  operator  so  that,  if  \K\  <  d, 
it  first  silently  pads  out  the  shorter  string  by  sufficiently  many  zeros  before 
computing  the  bitwise  xor.  It  will  also  be  convenient  to  define  xpad  =  ipad©opad. 
The  function  HMAC  :  {0, 1}*  x  {0, 1}*  {0, 1}"  is  defined  by 

HMAC  (AT,  M)  =  Gk(Fk(M))  =  (GK  °  FK)(M)  . 

We  sometimes  write  HMACd[P],  HMAC(;,  or  HMAC[P]  instead  of  HMAC  when 
we  want  to  make  the  reliance  on  the  block  size  and/or  an  underlying  ideal 
primitive  explicit. 

'  RFC  2104  defines  HMAC  over  strings  of  bytes,  but  we  chose  to  use  bits  to  provide 
more  general  positive  results  —  all  our  negative  results  lift  to  a  setting  in  which 
only  byte  strings  are  used.  Note  also  that  for  simplicity  we  assumed  H  with  domain 
{0,1}*.  I11  practice  hash  functions  often  do  have  some  maximal  length  (e.g.,  264), 
and  in  this  case  HMAC  must  be  restricted  to  smaller  lengths. 
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In  the  following  sections,  we  will  therefore  analyze  the  security  of  HMAC  in 
the  sense  of  being  indifferentiable  from  a  keyed  RO.  As  we  will  see,  the  story  is 
more  involved  than  one  might  expect. 


4.1  Weak  Key  Pairs  in  HMAC 

Towards  understanding  the  indifferentiability  of  HMAC,  we  start  by  observing 
that  the  way  HMAC  handles  keys  gives  rise  to  two  worrisome  classes  of  weak 
key  pairs. 

Colliding  keys.  We  say  that  keys  K  K'  collide  if  p{K)  ||  0d~\p(Kl\  = 

p(K')  ||  0d~\p(K  )l.  For  any  message  M  and  colliding  keys  K,K'  it  holds  that 
HMAC  (A",  M)  =  HMAC(A',,M).  Colliding  keys  exist  because  of  HMAC’s  am¬ 
biguous  encoding  of  different-length  keys.  Examples  of  colliding  keys  include  any 
AT,  K'  for  which  |  A'|  <  d  and  K'  =  K  ||  0s  where  1  <  s  <  d  —  |AT|.  Or  any  K ,  K' 
such  that  |  AT|  >  d  and  K'  =  H{K).  As  long  as  H  is  collision-resistant,  two  keys 
of  the  same  length  can  never  collide. 

Colliding  keys  enable  a  simple  attack  against  indifferentiability.  Consider 
HMAC[P]  for  any  underlying  function  P.  Then  let  A  pick  two  keys  K  ^  K' 
that  collide  and  an  arbitrary  message  M.  It  queries  its  Func  oracle  on  (. K,M ) 
and  ( K',M )  to  retrieve  two  values  Y.  Y' .  If  Y  =  Y'  then  it  returns  1  (guessing 
that  it  is  in  game  RealHMAC[P],7?.)  and  returns  0  otherwise  (guessing  that  it 
is  in  game  Ideal-^s).  The  advantage  of  A  is  equal  to  1  —  2n  regardless  of  the 
simulator  5,  which  is  never  invoked. 

Note  that  this  result  extends  directly  to  rule  out  related-key  attack  secu¬ 
rity  [6]  of  HMAC  as  a  PRF  should  a  related-key  function  be  available  that 
enables  deriving  colliding  keys. 

Ambiguous  keys.  A  pair  of  keys  K  ^  K'  is  ambiguous  if  p(K)  ©  ipad  = 
p(K’)  ©opad.  For  any  X,  both  Fk{X)  =  Gk'{X)  and  Gk(X)  =  Fk’{X)  when 
AT,  K'  are  ambiguous.  An  example  such  pair  is  AT,  K'  of  length  d  bits  for  which 
K  ©  K’  =  xpad. 

For  any  key  AT,  there  exists  one  key  I\'  that  is  easily  computable  and  for 
which  K,K'  are  ambiguous:  set  K'  =  p(K)  ©  xpad.  Finding  a  third  key  K" 
that  is  also  ambiguous  with  K  is  intractable  should  F[  be  collision  resistant.  The 
easily-computable  K'  will  not  necessarily  have  the  same  length  as  K.  In  fact, 
there  exist  ambiguous  key  pairs  of  the  same  length  k  only  when  k  £  {d—l,d}.  For 
a  fixed  length  shorter  than  d—  1,  no  ambiguous  key  pairs  exist  due  to  the  fact  that 
the  second  least  significant  bit  of  xpad  is  1.  For  a  fixed  length  longer  than  d  bits, 
if  n  <  d  —  1  then  no  ambiguous  key  pairs  exist  and  if  n  >  d  —  1  then  producing 
ambiguous  key  pairs  would  require  finding  A',  K'  such  that  I^(A')©I^(AT,)  equals 
the  first  n  bits  of  xpad.  This  is  intractable  for  any  reasonable  hash  function  FI. 

Ambiguous  key  pairs  give  rise  to  a  chain-shift  like  property.  Let  M  be  some 
message  and  K,  K'  be  an  ambiguous  key  pair.  Then,  we  have  that  p(AT')  = 
p{K)  ©xpad  and  so  Ar-(M)  =  Gk'{M).  Thus, 

HMAC(A'/,  FK(M))  =  GK'(HMAC(K,M))  . 
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HMAC£(if,  Y0) 


Fk  — +Y{  — tGk — ►  Yi  •  Yt-i 


Fk  — ^-i " 


\ 

Gk  I — *  Yt 


Fk  Y; 

y 


HMACqA-',  Yq) 


Fig.  3.  Diagram  of  two  hash  chains  ( K,Y )  =  {Yq,...,Y()  and  (. K',Y ')  = 
(Yq,  . . . ,  Y[)  for  HMAC  where  p(K')  =  p(K)  ©  xpad. 


As  with  H2 ,  this  property  gives  rise  to  problems  in  the  context  of  hash  chains. 
A  hash  chain  Y  =  ( AT ,  Y0, ...  ,Yg)  is  a  key  AT,  a  message  Y0,  and  a  sequence  of  t 
values  Y,  =  H(K ,  Y,;_ i)  for  1  <  *  <  £.  So  a  keyed  hash  chain  Y  =  ( AT ,  do. ,  Y}) 
for  HMAC  has  Yi  =  HMAC(A^,  Y^_i)  for  1  <  i  <  t.  Given  AT,  Yo,Ye  for  a  chain 
Y  =  (AT,  Yq,  ...  ,Y(),  it  is  easy  for  an  adversary  to  compute  the  start  and  end  of 
a  new  chain  Y'  =  (A"',  Yq,  ... ,  Y[)  that  does  not  overlap  with  Y.  See  Figure  3. 
In  the  full  version,  we  detail  how  this  structure  can  be  abused  in  the  context  of 
an  HMAC-based  mutual  proofs  of  work  protocol.  We  also  give  an  analogue  of 
Theorem  1,  i.e. ,  a  lower  bound  on  the  indifferentiability  of  HMAC  from  a  RO 
when  ambiguous  key  pairs  can  be  queried. 

4.2  Indifferentiability  of  HMAC  with  Restricted  Keys 

We  have  seen  that  HMAC’s  construction  gives  rise  to  two  kinds  of  weak  key  pairs 
that  can  be  abused  to  show  that  HMAC  is  not  indifferentiable  from  a  keyed  RO 
(with  good  bounds).  But  weak  key  pairs  are  serendipitously  avoided  in  most 
applications.  For  example,  the  recommended  usage  of  HKDF  [28]  specifies  keys 
of  a  fixed  length  less  than  d  —  1.  Neither  kind  of  weak  key  pairs  exist  within  this 
subset  of  the  key  space. 

While  one  can  show  indifferentiability  for  a  variety  of  settings  in  which  weak 
key  pairs  are  avoided,  we  focus  for  simplicity  on  the  case  mentioned  above.  That 
is,  we  restrict  to  keys  AT  for  which  |Af|  =  k  and  k  is  a  fixed  integer  different  less 
than  d—  1.  The  full  version  provides  a  more  general  set  of  results,  covering  also, 
for  example,  use  of  HMAC  with  a  fixed  key  of  any  length  less  than  or  equal  to  d. 

As  our  first  positive  result,  we  have  the  following  theorem,  which  establishes 
the  security  of  HMAC  when  modeling  the  underlying  hash  function  as  a  RO. 

Theorem  3.  Fix  d,  k,  n  >  0  with  k  <  d—  1.  Let  P :  {0, 1}*  -A  {0, 1}"  he  a  RO, 
and  consider  HMACd[P]  restricted  to  k-hit  keys.  Let  1Z:  {0,1}*  x  {0,1}*  — > 
{0, 1}™  be  a  keyed  RO.  Then  there  exists  a  simulator  S  such  that  for  any  distin- 
guisher  A  whose  total  query  cost  is  a  it  holds  that 

^VH^fCd[p])7j)5(A)  <  O  ^ 

S  makes  at  most  <72  queries  and  runs  in  time  0(q2  log  (72)  where  <72  is  the  number 
of  Prim  queries  made  by  A.  □ 
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The  use  of  O(-)  just  hides  small  constants.  The  proof  is  given  in  the  full 
version.  Combining  Theorem  3  with  the  indifferentiability  composition  theorem 
allows  us  to  conclude  security  for  HMAC<j[-H]  for  underyling  hash  function  H 
that  is,  itself,  indifferentiable  from  a  RO.  For  example,  should  H  be  one  of  the 
proven-indifferentiable  SHA-3  candidates.  This  does  not,  however,  give  us  a  se¬ 
curity  guarantee  should  H  not  be  indifferentiable  from  a  RO,  as  is  the  case 
with  MD  based  hash  functions.  We  therefore  also  prove,  in  the  full  version,  the 
following  theorem  that  establishes  indifferentiability  of  HMAC  using  an  under¬ 
lying  hash  function  built  via  the  strengthened  Merkle-Damgard  (SMD)  domain 
extension  transform. 

Theorem  4.  Fix  d,k,n  >  0  with  k  <  d  —  1  and  d  >  n.  Let  f:  {0,1}"  x 
(0,  l}d  — >  {0, 1}"  be  a  RO  and  consider  HMAC^ [SMD [/]]  restricted  to  k-bit  keys. 
Let  1Z :  (0, 1}*  x  {0, 1}*  — >  {0, 1}"  be  a  keyed  RO.  Then  there  exists  a  simulator 
S  such  that  for  any  distinguisher  A  whose  total  query  cost  is  a  <  2n~2  it  holds 
that 

AdVHMACd[SMD[/]],K,5("^)  <  & 

S  makes  at  most  <72  queries  and  runs  in  time  0(q2  log  (72)  where  <72  is  the  number 
of  Prim  queries  by  A.  □ 

We  note  that  the  restriction  to  cr  <  2"-2  in  the  theorem  statement  is  just 
a  technicality  to  make  the  bound  simpler  and  likewise  the  use  of  O(-)  in  the 
advantage  statement  hides  just  a  small  constant. 

Unlike  our  positive  results  about  H2,  the  bounds  provided  by  Theorems  3 
and  4  match,  up  to  small  constants,  results  for  other  now-standard  indifferen¬ 
tiable  constructions  (c.f.,  [13]).  First,  the  advantage  bounds  both  hold  up  to 
the  birthday  bound,  namely  a  ss  2"/2.  Second,  the  simulators  are  efficient  and, 
specifically,  make  at  most  one  query  per  invocation.  All  this  enables  use  of  the 
indifferentiability  composition  theorem  in  a  way  that  yields  strong,  standard 
concrete  security  bounds. 
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Abstract 

We  describe  the  design  and  implementation  of  a  software  library  that  implements  the 
Brakerski-Gentry-Vaikuntanathan  (BGV)  homomorphic  encryption  scheme,  along  with  many 
optimizations  to  make  homomorphic  evaluation  runs  faster,  focusing  mostly  on  effective  use  of 
the  Smart- Vercauteren  ciphertext  packing  techniques.  Our  library  is  written  in  CH — h  and  uses 
the  NTL  mathematical  library. 
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Organization  of  This  Report 


We  begin  in  Section  1  with  a  brief  high-level  overview  of  the  BGV  cryptosystem  and  some  important 
features  of  the  variant  that  we  implemented  and  our  choice  of  representation,  as  well  as  an  overview 
of  the  structure  of  our  library.  Then  in  Sections  2,  3,4  we  give  a  bottom-up  detailed  description  of 
all  the  modules  in  the  library.  We  conclude  in  Section  5  with  some  examples  of  using  this  library. 

1  The  BGV  Homomorphic  Encryption  Scheme 

A  homomorphic  encryption  scheme  [8,  3]  allows  processing  of  encrypted  data  even  without  know¬ 
ing  the  secret  decryption  key.  In  this  report  we  describe  the  design  and  implementation  of  a 
software  library  that  we  wrote  to  implements  the  Brakerski-Gentry-Vaikuntanathan  (BGV)  ho¬ 
momorphic  encryption  scheme  [2].  We  begin  by  a  high-level  description  of  the  the  BGV  variant 
that  we  implemented,  followed  by  a  detailed  description  of  the  various  software  components  in  our 
implementation,  the  description  in  this  section  is  mostly  taken  from  the  full  version  of  [5]. 

Below  we  denote  by  [-]9  the  reduct  ion-  mo  d- q  function,  namely  mapping  an  integer  zGZto  the 
unique  representative  of  its  equivalence  class  modulo  q  in  the  interval  (—q/2,q/2].  We  use  the  same 
notation  for  modular  reduction  of  vectors,  matrices,  and  polynomials  (in  coefficient  representation). 

Our  BGV  variant  is  defined  over  polynomial  rings  of  the  form  A  =  Z[V]/<hm(A)  where  m 
is  a  parameter  and  4>m(A)  is  the  m’th  cyclotomic  polynomial.  The  “native”  plaintext  space  for 
this  scheme  is  usually  the  ring  A2  =  A/2A,  namely  binary  polynomials  modulo  4>m(A).  (Our 
implementation  supports  other  plaintext  spaces  as  well,  but  in  this  report  we  mainly  describe  the 
case  of  plaintext  space  A2.  See  some  more  details  in  Section  2.4.)  We  use  the  Smart- Vercauteren 
CTR-based  encoding  technique  [10]  to  “pack”  a  vector  of  bits  in  a  binary  polynomial,  so  that 
polynomial  arithmetic  in  A2  translates  to  entry-wise  arithmetic  on  the  packed  bits. 

The  ciphertext  space  for  this  scheme  consists  of  vectors  over  Aq  =  A/qA,  where  q  is  an  odd 
modulus  that  evolves  with  the  homomorphic  evaluation.  Specifically,  the  system  is  parametrized 
by  a  “chain”  of  moduli  of  decreasing  size,  qo  >  q\  >■■■  >  qj_,  and  freshly  encrypted  ciphertexts  are 
defined  over  Rqo .  During  homomorphic  evaluation  we  keep  switching  to  smaller  and  smaller  moduli 
until  we  get  ciphertexts  over  AqL ,  on  which  we  cannot  compute  anymore.  We  call  ciphertexts  that 
are  defined  over  Aqi  “level-*  ciphertexts”.  These  level-*  ciphertexts  are  2-element  vectors  over  Rqi, 
i.e.,  c=  (c0,ci)  £  (AqJ2. 

Secret  keys  are  polynomials  s  £  A  with  “small”  coefficients,  and  we  view  s  as  the  second  element 
of  the  2-vector  s  =  (l,s).  A  level-*  ciphertext  c  =  (co,ci)  encrypts  a  plaintext  polynomial  m  £  A2 
with  respect  to  s  =  (l,s)  if  we  have  the  equality  over  A,  [(c,  s)]9i  =  [co  +  s  •  c\]qi  =  m  (mod  2),  and 
moreover  the  polynomial  [co+s-ci]^  is  “small”,  i.e.  all  its  coefficients  are  considerably  smaller  than 
qi.  Roughly,  that  polynomial  is  considered  the  “noise”  in  the  ciphertext,  and  its  coefficients  grow 
as  homomorphic  operations  are  performed.  We  note  that  the  crux  of  the  noise-control  technique 
from  [2]  is  that  a  level-*  ciphertext  can  be  publicly  converted  into  a  level- (*  +  1)  ciphertext  (with 
respect  to  the  same  secret  key),  and  that  this  transformation  reduces  the  noise  in  the  ciphertext 
roughly  by  a  factor  of  ^+1/%. 

Following  [7,  4,  5],  we  think  of  the  “size”  of  a  polynomial  a  £  A  the  norm  of  its  canonical 
embedding.  Recall  that  the  canonical  embedding  of  a  £  A  into  is  the  r/>(m)-vector  of  complex 

numbers  a  (a)  =  (a(rm))j  where  rm  is  a  complex  primitive  m-th  root  of  unity  (rm  =  e2ni/m)  and 
the  indexes  j  range  over  all  of  We  denote  the  ^-norrn  of  the  canonical  embedding  of  a  by 


1 


16.  Design  and  Implementation  of  a  Homomorphic-Encryption  Library 


II  ~  II canon 

The  basic  operations  that  we  have  in  this  scheme  are  the  usual  key-generation,  encryption,  and 
decryption,  the  homomorphic  evaluation  routines  for  addition,  multiplication  and  automorphism 
(and  also  addition-of-constant  and  multiplication-by-constant),  and  the  “ciphertext  maintenance” 
operations  of  key-switching  and  modulus-switching.  These  are  described  in  the  rest  of  this  report, 
but  first  we  describe  our  plaintext  encoding  conventions  and  our  Double-CRT  representation  of 
polynomials. 

1.1  Plaintext  Slots 

The  native  plaintext  space  of  our  variant  of  BGV  are  elements  of  A2,  and  the  polynomial  <Fm(X) 
factors  modulo  2  into  i  irreducible  factors,  4>m(A)  =  F\(X)  -F^A)  ■  ■  ■  F^(X)  (mod  2),  all  of  degree 
d  =  4>(m)/l.  Just  as  in  [2,  4,  10]  each  factor  corresponds  to  a  “plaintext  slot”.  That  is,  we  can 
view  a  polynomial  a  £  A2  as  representing  an  A  vector  (a  mod  T))(=1 . 

More  specifically,  for  the  purpose  of  packing  we  think  of  a  polynomial  a  £  A2  not  as  a  binary 
polynomial  but  as  a  polynomial  over  the  extension  field  F 2d  (with  some  specific  representation), 
and  the  plaintext  values  that  are  encoded  in  a  are  its  evaluations  at  t  specific  primitive  m-th  roots 
of  unity  in  F2  .  In  other  words,  if  p  £  F2d  is  a  particular  fixed  primitive  m-th  root  of  unity,  and  our 
distinguished  evaluation  points  are  ptx ,  pt2 , . . . ,  pte  (for  some  set  of  indexes  T  =  {t\, ...  ,tg}),  then 
the  vector  of  plaintext  values  encoded  in  a  is: 


{a(ptj)  ■  tj€T). 


See  Section  2.4  for  a  discussion  of  the  choice  of  representation  of  F2d  and  the  evaluation  points. 

It  is  standard  fact  that  the  Galois  group  (Jal  =  £?al(<Q>(/9m)/<Q>)  consists  of  the  mappings  Kk  : 
a(X)  i->-  a(Afc)  mod  <Fm(A)  for  all  k  co-prime  with  m,  and  that  it  is  isomorphic  to  T?m.  As  noted 
in  [4],  for  each  i ,  j  £  {1,2,  ■■■,£}  there  is  an  element  Kk  £  Q al  which  sends  an  element  in  slot  i  to 
an  element  in  slot  j.  Indeed  if  we  set  k  =  tj1  ■  tt  (mod  m)  and  b  =  Kk{a )  then  we  have 

Kp**)  =  &(ptjk)  =  a(p,’-tih')  =  a{p% 

so  the  element  in  the  j’th  slot  of  b  is  the  same  as  that  in  the  i’th  slot  of  a.  In  addition  to  these  “data- 
movement  maps”,  Q al  contains  also  the  Frobenius  maps,  X  — >  A2*,  which  also  act  as  Frobenius 
on  the  individual  slots  separately. 

We  note  that  the  values  that  are  encoded  in  the  slots  do  not  have  to  be  individual  bits,  in 
general  they  can  be  elements  of  the  extension  field  F 2d  (or  any  sub- field  of  it).  For  example,  for  the 
AES  application  we  may  want  to  pack  elements  of  F28  in  the  slots,  so  we  choose  the  parameters  so 
that  F28  is  a  sub- field  of  F2d  (which  means  that  d  is  divisible  by  8). 

1.2  Our  Modulus  Chain  and  Double-CRT  Representation 

We  define  the  chain  of  moduli  by  choosing  L  +  1  “small  primes”  po,p\,  ■ . .  ,Pl  and  the  Z’th  modulus 
in  our  chain  is  defined  as  qi  =  n/=o Pj-  ^ie  primes  pi's  are  chosen  so  that  for  all  i,  7Llp.{L 
contains  a  primitive  m-th  root  of  unity  (call  it  Q)  so  <Fm(X)  factors  modulo  pi  to  linear  terms 
®m(X)  =  UjeZ*m(X  -  C I)  (mod  p^. 

A  key  feature  of  our  implementation  is  that  we  represent  an  element  a  £  Aqi  via  double-CRT 
representation,  with  respect  to  both  the  integer  factors  of  qi  and  the  polynomial  factor  of  <Fm(X) 
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mod  qi .  A  polynomial  a  G  Aq  is  represented  as  the  (/  +  1)  x  <p(m)  matrix  of  its  evaluation  at  the 
roots  of  <hm(A)  modulo  pi  for  i  =  0, . . . ,  l: 

DoubleCRT^a)  =  fa( (|)  mod  pt  ) 

V  /  o<i<z, 

Addition  and  multiplication  in  Aq  can  be  computed  as  component-wise  addition  and  multipli¬ 
cation  of  the  entries  in  the  two  tables  (modulo  the  appropriate  primes  p,'), 

DoubleCRT^a  +  b)  =  DoubleCRT^a)  +  DoubleCRT?(6), 

DoubleCRT'(o  •  b)  =  DoubleCRT'(a)  •  DoubleCRT*(6). 

Also,  for  an  element  of  the  Galois  group  k  G  £/al,  mapping  a(X)  S  A  to  a(Xk )  mod  <Fm(X),  we  can 
evaluate  k (a)  on  the  double-CRT  representation  of  a  just  by  permuting  the  columns  in  the  matrix, 
sending  each  column  j  to  column  j  ■  k  mod  m. 

1.3  Modules  in  our  Library 

Very  roughly,  our  HE  library  consists  of  four  layers:  in  the  bottom  layer  we  have  modules  for 
implementing  mathematical  structures  and  various  other  utilities,  the  second  layer  implements 
our  Double-CRT  representation  of  polynomials,  the  third  layer  implements  the  cryptosystem  itself 
(with  the  “native”  plaintext  space  of  binary  polynomials),  and  the  top  layer  provides  interfaces 
for  using  the  cryptosystem  to  operate  on  arrays  of  plaintext  values  (using  the  plaintext  slots  as 
described  in  Section  1.1).  We  think  of  the  bottom  two  layers  as  the  “math  layers”,  and  the  top 
two  layers  as  the  “crypto  layers”,  and  describe  then  in  detail  in  Sections  2  and  3,  respectively. 
A  block-diagram  description  of  the  library  is  given  in  Figure  1.  Roughly,  the  modules  NumbTh, 
timing,  bluestein,  PAIgebra,  PAIgebraModTwo,  PAIgebraMod2r,  Cmodulus,  IndexSet  and  IndexMap 
belong  to  the  bottom  layer,  FHEcontext,  SingleCRT  and  DoubleCRT  belong  to  the  second  layer, 
FHE,  Ctxt  and  KeySwitching  are  in  the  third  layer,  and  EncryptedArray  and  EncryptedArrayMod2r 
are  in  the  top  layer. 

2  The  Math  Layers 

2.1  The  timing  module 

This  module  contains  some  utility  function  for  measuring  the  time  that  various  methods  take  to 
execute.  To  use  it,  we  insert  the  macro  FHE_TIMER_START  at  the  beginning  of  the  method(s)  that 
we  want  to  time  and  FHE_TIMER_ST0P  at  the  end,  then  the  main  program  needs  to  call  the  function 
setTimersOnO  to  activate  the  timers  and  setTimersOf f  ()  to  pause  them.  We  can  have  at  most 
one  timer  per  method/function,  and  the  timer  is  called  by  the  same  name  as  the  function  itself 
(using  the  pre-defiend  variable  __func__).  To  obtain  the  value  of  a  given  timer  (in  seconds),  the 
application  can  use  the  function  double  getTime4func  (const  char  *fncName) ,  and  the  function 
print AllTimers  ()  prints  the  values  of  all  timers  to  the  standard  output. 

2.2  NumbTh:  Miscellaneous  Utilities 

This  module  started  out  as  an  implementation  of  some  number-theoretic  algorithms  (hence  the 
name),  but  since  then  it  grew  to  include  many  different  little  utility  functions.  For  example,  CRT- 
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EncryptedArray/EncrytedArrayMod2r 

Routing  plaintext  slots,  §4.1 

KeySwitching 

Matrices  for  key-switching,  §3.3 

FHE 

KeyGen/Enc/Dec,  §3.2 

Ctxt 

Ciphertext  operations,  §3.1 

FHEcontext 
arameters,  §2 

SingleCRT/DoubleCRT 
polynomial  arithmetic,  §2.8 

Q. 

CModulus 

polynomials  mod  p,  §2.3 

PAIgebra2/PAIgebra2r 

plaintext-slot  algebra,  §2.5 

IndexSet/Inde 

Indexing  utilities 

xMap 

,  §2.6 

bluestein 

FFT/IFFT,  §2.3 

PAIgebra 

Structure  of  Zm*,  §2.4 

NumbTh 

miscellaneous 
utilities,  §2.2 

timing 

§2.1 

Figure  1:  A  block  diagram  of  the  Homomorphic-Encryption  library 


reconstruction  of  polynomials  in  coefficient  representation,  conversion  functions  between  different 
types,  procedures  to  sample  at  random  from  various  distributions,  etc. 

2.3  bluestein  and  Cmodulus:  Polynomials  in  FFT  Representation 

The  bluestein  module  implements  a  non-power-of-two  FFT  over  a  prime  field  Z p,  using  the  Bluestein 
FFT  algorithm  [1].  We  use  modulo-p  polynomials  to  encode  the  FFTs  inputs  and  outputs.  Specif¬ 
ically  this  module  builds  on  Shoup’s  NTL  library  [9],  and  contains  both  a  bigint  version  with  types 
ZZ_p  and  ZZ_pX,  and  a  smallint  version  with  types  zz_p  and  zz_pX.  We  have  the  following  functions: 

void  BluesteinFFT (ZZ_pX&  x,  const  ZZ_pX&  a,  long  n,  const  ZZ_p &  root, 

ZZ_pX&  powers ,  FFTRepfe  Rb) ; 

void  BluesteinFFT (zz_pX&  x,  const  zz_pX&  a,  long  n,  const  zz_p&  root, 
zz_pX&  powers,  fftRepfe  Rb) ; 

These  functions  compute  length-?r  FFT  of  the  coefficient-vector  of  a  and  put  the  result  in  x.  If  the 
degree  of  a  is  less  than  n  then  it  treats  the  top  coefficients  as  0,  and  if  the  degree  is  more  than  n 
then  the  extra  coefficients  are  ignored.  Similarly,  if  the  top  entries  in  x  are  zeros  then  x  will  have 
degree  smaller  than  n.  The  argument  root  needs  to  be  a  2 n-th  root  of  unity  in  Zp.  The  inverse-FFT 
is  obtained  just  by  calling  BluesteinFFT! .  .  .  ,  root”1 ,...),  but  this  procedure  is  NOT  SCALED. 
Hence  calling  BluesteinFFT (x, a, n, root ,  .  .  . )  and  then  BluesteinFFT (b ,x,n,root_1 ,  .  .  . )  will 
result  in  having  b  =  n  x  a. 

In  addition  to  the  size-n  FFT  of  a  which  is  returned  in  x,  this  procedure  also  returns  the 
powers  of  root  in  the  powers  argument,  powers  =  (l,  root,  root4,  root9, . . .  ,root^n_1^  ).  In  the 
Rb  argument  it  returns  the  size- IV  FFT  representation  of  the  negative  powers,  for  some  N  >  2n  —  1, 
N  a  power  of  two: 

Rb  =  FFT^(  0, . . . ,  0,  root~^n_1)  ,...,  root-4,  root-1, 1,  root-1,  root-4, ...,  root-^”1^  0, ... , 
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On  subsequent  calls  with  the  same  powers  and  Rb,  these  arrays  are  not  computed  again  but  taken 
from  the  pre-computed  arguments.  If  the  powers  and  Rb  arguments  are  initialized,  then  it  is 
assumed  that  they  were  computed  correctly  from  root.  The  behavior  is  undefined  when  calling 
with  initialized  powers  and  Rb  but  a  different  root.  (In  particular,  to  compute  the  inverse-FFT 
using  root-1,  one  must  provide  different  powers  and  Rb  arguments  than  those  that  were  given 
when  computing  in  the  forward  direction  using  root.)  This  procedure  cannot  be  used  for  in-place 
FFT,  calling  BluesteinFFT(x,  x,  •  •  • )  will  just  zero-out  the  polynomial  x. 

The  classes  Cmodulus  and  CModulus.  These  classes  provide  an  interface  layer  for  the  FFT 
routines  above,  relative  to  a  single  prime  (where  Cmodulus  is  used  for  smallint  primes  and  CModulus 
for  bigint  primes).  They  keep  the  NTL  “current  modulus”  structure  for  that  prime,  as  well  as  the 
powers  and  Rb  arrays  for  FFT  and  inverse-FFT  under  that  prime.  They  are  constructed  with  the 
constructors 

Cmodulus (const  PAlgebra&  ZmStar,  const  long&  q,  const  long&  root); 

CModulus (const  PAlgebra&  ZmStar,  const  ZZ&  q,  const  ZZ&  root); 

where  ZmStar  described  the  structure  of  Z*(  (see  Section  2.4),  q  is  the  prime  modulus  and  root 
is  a  primitive  2m—  ’th  root  of  unity  modulo  q.  (If  the  constructor  is  called  with  root  =  0  then  it 
computes  a  2m-th  root  of  unity  by  itself.)  Once  an  object  of  one  of  these  classes  is  constructed,  it 
provides  an  FFT  interfaces  via 

void  Cmodulus :: FFT (vec_long&  y,  const  ZZX&  x)  const;  //  y  =  FFT(x) 

void  Cmodulus :: iFFT(ZZX&  x,  const  vev_long&  y)  const;  //  x  =  FFT-1(y) 

(And  similarly  for  CModulus  using  vec_ZZ  instead  of  vec_long).  These  method  are  inverses  of 
each  other.  The  methods  of  these  classes  affect  the  NTL  “current  modulus”,  and  it  is  the  re¬ 
sponsibility  of  the  caller  to  backup  and  restore  the  modulus  if  needed  (using  the  NTL  constructs 
zz_pBak/ZZ_pBak). 

2.4  PAIgebra:  The  Structure  of  Z*n  and  Z *m/  (2) 

The  class  PAIgebra  is  the  base  class  containing  the  structure  of  ZJ^,  as  well  as  the  quotient  group 
Z *mf  (2).  We  represent  T,*rn  as  ZJ^  =  (2)  x  (<71,52,  •  •  ■)  x  (hi,  h-2,  ■  ■  .),  where  the  gi’s  have  the  same 
order  in  Z*n  as  in  Z *n/  (2),  and  the  h^s  generate  the  group  Z*n/  (2, 51, 52,  •  •  •)  and  they  do  not  have 
the  same  order  in  Z*m  as  in  Z*n/  (2). 

We  compute  this  representation  in  a  manner  similar  (but  not  identical)  to  the  proof  of  the  fun¬ 
damental  theorem  of  finitely  generated  abelian  groups.  Namely  we  keep  the  elements  in  equivalence 
classes  of  the  “quotient  group  so  far”,  and  each  class  has  a  representative  element  (called  a  pivot), 
which  in  our  case  we  just  choose  to  be  the  smallest  element  in  the  class.  Initially  each  element 
is  in  its  own  class.  At  every  step,  we  choose  the  highest  order  element  g  in  the  current  quotient 
group  and  add  it  as  a  new  generator,  then  unify  classes  if  their  members  are  a  factor  of  g  from  each 
other,  repeating  this  process  until  no  further  unification  is  possible.  Since  we  are  interested  in  the 
quotient  group  Z *m/  (2),  we  always  choose  2  as  the  first  generator. 

One  twist  in  this  routine  is  that  initially  we  only  choose  an  element  as  a  new  generator  if  its 
order  in  the  current  quotient  group  is  the  same  as  in  the  original  group  Z*n.  Only  after  no  such 
elements  are  available,  do  we  begin  to  use  generators  that  do  not  have  the  same  order  as  in  Z)^. 

Once  we  chose  all  the  generators  (and  for  each  generator  we  compute  its  order  in  the  quotient 
group  where  it  was  chosen),  we  compute  a  set  of  “slot  representatives”  as  follows:  Putting  all  the 
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(ji's  and  s  in  one  list,  let  us  denote  the  generators  of  Z^/  (2)  by  {/i,  /2 , . . . ,  fn},  and  let  ord (/*) 
be  the  order  of  fi  in  the  quotient  group  at  the  time  that  it  was  added  to  the  list  of  generators.  The 
the  slot-index  representative  set  is 

T  =f  I II  fC  mod  to  :  Vi,  e*  E  {0, 1, . . . ,  ord(/j)  -  1} 

U=i 

Clearly,  we  have  T  C  Z*n,  and  moreover  T  contains  exactly  one  representative  from  each  equivalence 
class  of  Z *m/  (2).  Recall  that  we  use  these  representatives  in  our  encoding  of  plaintext  slots,  where 
a  polynomial  a  G  A2  is  viewed  as  encoding  the  vector  of  F2<j  elements  {a{pl)  G  ¥2d  :  1  6  T),  where 
p  is  some  fixed  primitive  m-th  root  of  unity  in  F2a. 

In  addition  to  defining  the  sets  of  generators  and  representatives,  the  class  PAIgebra  also  provides 
translation  methods  between  representations,  specifically: 

int  ith.rep (unsigned  i)  const; 

Returns  U,  i.e.,  the  i’th  representative  from  T. 

int  indexOf Rep (unsigned  t)  const; 

Returns  the  index  i  such  that  ith_rep(i)  =  t. 

int  exponentiate (const  vector<unsigned>&  exps,  bool  onlySameOrd=f alse)  const; 

Takes  a  vector  of  exponents,  (ei, . . . ,  en)  and  returns  t  =  nf=i  f V  £  T- 

const  int*  dLog(unsigned  t)  const; 

On  input  some  t  E  T.  returns  the  discrete- logarithm  of  t  with  the  ffs  are  bases.  Namely,  a 
vector  exps=  (ei, . . . ,  en)  such  that  exponentiate  (exps)  =  t,  and  moreover  0  <  e*  <  ord  (/,) 
for  all  i. 

2.5  PAIgebraModTwo/PAIgebraMod2r:  Plaintext  Slots 

These  two  classes  implements  the  structure  of  the  plaintext  spaces,  either  A2  =  A/2A  (when  using 
mod-2  arithmetic  for  the  plaintext  space)  or  A2^  =  A/2rA  (when  using  mod-2r  arithmetic,  for 
some  small  vale  of  r,  e.g.  mod-128  arithmetic).  We  typically  use  the  mod-2  arithmetic  for  real 
computation,  but  we  expect  to  use  the  mod-2r  arithmetic  for  bootstrapping,  as  described  in  [6]. 
Below  we  cover  the  mod-2  case  first,  then  extend  it  to  mod-2r. 

For  the  mod-2  case,  the  plaintext  slots  are  determined  by  the  factorization  of  <J>m(Y)  modulo  2 
into  t  degree-d  polynomials.  Once  we  have  that  factorization,  <Fm(X)  =  n,  Fj(^0  (mod  2),  we 
choose  an  arbitrary  factor  as  the  “first  factor”,  denote  it  F\  (A),  and  this  corresponds  to  the  first 
input  slot  (whose  representative  is  1  G  T).  With  each  representative  t  G  T  we  then  associate 
the  factor  GCD(Fi(Xt),  <Fm(X)),  with  polynomial-GCD  computed  modulo  2.  Note  that  fixing  a 
representation  of  the  field  K  =  ^[X]/ F\(X)  =  F2d  and  letting  p  be  a  root  of  F\  in  K,  we  get  that 
the  factor  associated  with  the  representative  t  is  the  minimal  polynomial  of  p1^.  Yet  another  way 
of  saying  the  same  thing,  if  the  roots  of  F\  in  K  are  p,  p2 ,  p  , . . . ,  p2  then  the  roots  of  the  factor 
associated  to  t  are  p1^  ,  p2//<,  p4^‘,  ■  ■  ■ ,  p2  ^ ,  where  the  arithmetic  in  the  exponent  is  modulo  m. 

After  computing  the  factors  of  $m(Y)  modulo  2  and  the  correspondence  between  these  factors 
and  the  representatives  from  T ,  the  class  PAIgebra ModTwo  provide  encoding/decoding  methods  to 
pack  elements  in  polynomials  and  unpack  them  back.  Specifically  we  have  the  following  methods: 
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void  mapToSlots (vector<GF2X>&  maps,  const  GF2X&  G)  const; 

Computes  the  mapping  between  base-G  representation  and  representation  relative  to  the  slot 
polynomials.  (See  more  discussion  below.) 

void  embedlnSlots (GF2X&a,  const  vector<GF2X>&alphas ,  const  vector<GF2X>&maps)  const 
Use  the  maps  that  were  computed  in  mapToSlots  to  embeds  the  plaintext  values  in  alphas 
into  the  slots  of  the  polynomial  a  £  A2.  Namely,  for  every  plaintext  slot  i  with  representative 
tt  £  T,  we  have  a(ptl )  =  alphas  [ij.  Note  that  alphas  [i]  is  an  element  in  base-G  represen¬ 
tation,  while  a(pt)  is  computed  relative  to  the  representation  of  F 2d  as  ^[X]/ F\(X).  (See 
more  discussion  below.) 

void  decodePlaintext (vector<GF2X>&  alphas,  const  GF2X&  a, 

const  GF2X&  G,  const  vector<GF2X>&  maps)  const; 

This  is  the  inverse  of  embedlnSlots,  it  returns  in  alphas  a  vector  of  base-G  elements  such 
that  alphas [i]  =  a(pti). 

void  CRT  .decompose (vector<GF2X>&  crt,  const  GF2X&  p)  const; 

Returns  a  vector  of  polynomials  such  that  crt[i]  =  p  mod  Ft,-  (with  U  being  the  V th  repre¬ 
sentative  in  T). 

void  CRT  .reconstruct (GF2X&  p,  vector<GF2X>&  crt)  const; 

Returns  a  polynomial  p  £  A2)  s.t.  for  every  i  <  t  and  U  =  T[i],  we  have  p  =  crt[i]  (mod  Ft). 

The  use  of  the  first  three  functions  may  need  some  more  explanation.  As  an  illustrative  example, 
consider  the  case  of  the  AES  computation,  where  we  embed  bytes  of  the  AES  state  in  the  slots, 
considered  as  elements  of  F2s  relative  to  the  AES  polynomial  G(X)  =  X 8  +  X 4  +  X3  +  X  +  1. 
We  choose  our  parameters  so  that  we  have  8| d  (where  d  is  the  order  of  2  in  Z^J,  and  then  use  the 
functions  above  to  embed  the  bytes  into  our  plaintext  slots  and  extract  them  back. 

We  first  call  mapToSlots  (maps,  G)  to  prepare  compute  the  mapping  from  the  base-G  represen¬ 
tation  that  we  use  for  AES  to  the  “native”  representation  o  four  cryptosystem  (i.e.,  relative  to  F\ , 
which  is  one  of  the  degree-d  factors  of  <f>m,(A)).  Once  we  have  maps,  we  use  them  to  embed  bytes 
in  a  polynomial  with  embedlnSlots,  and  to  decode  them  back  with  decodePlaintext. 

The  case  of  plaintext  space  modulo  2r,  implemented  in  the  class  PAIgebraMod2r,  is  similar. 
The  partition  to  factors  of  <f>m(X)  modulo  2r  and  their  association  with  representatives  in  T  is 
done  similarly,  by  first  computing  everything  modulo  2,  then  using  Hensel  lifting  to  lift  into  a 
factorization  modulo  2r.  In  particular  we  have  the  same  number  i  of  factors  of  the  same  degree  d. 
One  difference  between  the  two  classes  is  that  when  working  modulo-2  we  can  have  elements  of 
an  extension  held  F2d  in  the  slots,  but  when  working  modulo  2r  we  enforce  the  constraint  that 
the  slots  contain  only  scalars  (i.e.,  r-bit  signed  integers,  in  the  range  [— 2r~1, 2r~1)).  This  means 
that  the  polynomial  G  that  we  use  for  the  representation  of  the  plaintext  values  is  set  to  the  linear 
polynomial  G(X)  =  X.  Other  than  this  change,  the  methods  for  PAIgebraMod2r  are  the  same  as 
these  of  PAIgebraModTwo,  except  that  we  use  the  NTL  types  zz_p  and  zz.pX  rather  than  GF2  and 
GF2X.  The  methods  of  the  PAIgebraMod2r  class  affect  the  NTL  “current  modulus”,  and  it  is  the 
responsibility  of  the  caller  to  backup  and  restore  the  modulus  if  needed  (using  the  NTL  constructs 
zz_pBak/ZZ_pBak). 
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2.6  IndexSet  and  IndexMap:  Sets  and  Indexes 

In  our  implementation,  all  the  polynomials  are  represented  in  double-CRT  format,  relative  to  some 
subset  of  the  small  primes  in  our  list  (cf.  Section  1.2).  The  subset  itself  keeps  changing  throughout 
the  computation,  and  so  we  can  have  the  same  polynomial  represented  at  one  point  relative  to 
many  primes,  then  a  small  number  of  primes,  then  many  primes  again,  etc.  (For  example  see  the 
implementation  of  key-switching  in  Section  3.1.6.)  To  provide  flexibility  with  these  transformations, 
the  IndexSet  class  implements  an  arbitrary  subset  of  integers,  and  the  IndexMap  class  implements 
a  collection  of  data  items  that  are  indexed  by  such  a  subset. 

2.6.1  The  IndexSet  class 

The  IndexSet  class  implements  all  the  standard  interfaces  of  the  abstract  data-type  of  a  set,  along 
with  just  a  few  extra  interfaces  that  are  specialized  to  sets  of  integers.  It  uses  the  standard  C+- 1- 
container  vector<bool>  to  keep  the  actual  set,  and  provides  the  following  methods: 

Constructors.  The  constructors  IndexSet  (),  IndexSet  (long  j ),  and  IndexSet  (long  low,  long 
high) ,  initialize  an  empty  set,  a  singleton,  and  an  interval,  respectively. 

Empty  sets  and  cardinality.  The  static  method  IndexSet :  :  emptySet  ()  provides  a  read-only  ac¬ 
cess  to  an  empty  set,  and  the  method  s. clear ()  removes  all  the  elements  in  s,  which  is 
equivalent  to  s=lndexSet:  :  emptySet (). 

The  method  s.cardQ  returns  the  number  of  elements  in  s. 

Traversing  a  set.  The  methods  s .  first  ()  and  s  .  last  ()  return  the  smallest  and  largest  element 
in  the  set,  respectively.  For  an  empty  set  s,  s.firstO  returns  0  and  s.lastO  returns  —1. 

The  method  s.next(j)  return  the  next  element  after  j,  if  any;  otherwise  j  +  1.  Similarly 
s  .  prev(j)  return  the  previous  element  before  j,  if  any;  otherwise  j  —  1.  With  these  methods, 
we  can  iterate  through  a  set  s  using  one  of: 

for  (long  i  =  s.firstO;  i  <=  s.lastO;  i  =  s.next(i))  ... 
for  (long  i  =  s.lastO;  i  >=  s.firstO;  i  =  s.prev(i))  ... 

Comparison  and  membership  methods.  operator==  and  operator !  =  are  provided  to  test  for 
equality,  whereas  si  .disjointFrom(s2)  and  its  synonym  disjoint(sl,s2)  test  if  the  two 
sets  are  disjoint.  Also,  s .  contains  ( j )  returns  true  if  s  contains  the  element  j,  s  .  contains  (other) 
returns  true  if  s  is  a  superset  of  other.  For  convenience,  the  operators  <=,  <,  >=  and  >  are 
also  provided  for  testing  the  subset  relation  between  sets. 

Set  operations.  The  method  s  .  insert  ( j )  inserts  the  integer  j  if  it  is  not  in  s,  and  s  .  remove  ( j ) 
removes  it  if  it  is  there. 

Similarly  si .  insert  (s2)  returns  in  si  the  union  of  the  two  sets,  and  si .  remove  (s2)  returns 
in  si  the  set  difference  si  \  s2.  Also,  si . retain(s2)  returns  in  si  the  intersection  of  the 
two  sets.  For  convenience  we  also  provide  the  operators  si  I  s2  (union),  sl&s2  (intersection), 
sl~s2  (symmetric  difference,  akaxor),  and  sl/s2  (set  difference). 
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2.6.2  The  IndexMap  class 


The  class  template  lndexMap<T>  implements  a  map  of  elements  of  type  T,  indexed  by  a  dynamic 
IndexSet.  Additionally,  it  allows  new  elements  of  the  map  to  be  initialized  in  a  flexible  manner, 
by  providing  an  initialization  function  which  is  called  whenever  a  new  element  (indexed  by  a  new 
index  j )  is  added  to  the  map. 

Specifically,  we  have  a  helper  class  template  IndexMapInit<T>  that  stores  a  pointer  to  an 
initialization  function,  and  possibly  also  other  parameters  that  the  initialization  function  needs. 
We  then  provide  a  constructor  IndexMap (IndexMapInit<T>*  initObject=NULL)  that  associates 
the  given  initialization  object  with  the  new  IndexMap  object.  Thereafter,  When  a  new  index  j  is 
added  to  the  index  set,  an  object  t  of  type  T  is  created  using  the  default  constructor  for  T,  after 
which  the  function  initObject->init(t)  is  called. 

In  our  library,  we  use  an  IndexMap  to  store  the  rows  of  the  matrix  of  a  Double-CRT  object. 
For  these  objects  we  have  an  initialization  object  that  stores  the  value  of  4>(m),  and  the  initializa¬ 
tion  function,  which  is  called  whenever  we  add  a  new  row,  ensures  that  all  the  rows  have  length 
exactly  < f>(m ). 

After  initialization  an  IndexMap  object  provides  the  operator  map  [i]  to  access  the  type-T  object 
indexed  by  i  (if  i  currently  belongs  to  the  IndexSet),  as  well  as  the  methods  map .  insert  (i)  and 
map. removed)  to  insert  or  delete  a  single  data  item  indexed  by  i,  and  also  map .  insert  (s)  and 
map. remove (s)  to  insert  or  delete  a  collection  of  data  items  indexed  by  the  IndexSet  s. 

2.7  FHEcontext:  Keeping  the  parameters 

Objects  in  higher  layers  of  our  library  are  defined  relative  to  some  parameters,  such  as  the  integer 
parameter  m  (that  defines  the  groups  Z*n  and  Z *m/  (2)  and  the  ring  A  =  Z[A']/<hm(A))  and  the 
sequence  of  small  primes  that  determine  our  modulus-chain.  To  allow  convenient  access  to  these 
parameters,  we  define  the  class  FHEcontext  that  keeps  them  all  and  provides  access  methods  and 
some  utility  functions. 

One  thing  that’s  included  in  FHEcontext  is  a  vector  of  Cmodulus  objects,  holding  the  small 
primes  that  define  our  modulus  chain: 

vector<Cmodulus>  moduli;  //  Cmodulus  objects  for  the  different  primes 

We  provide  access  to  the  Cmodulus  objects  via  context .  ithModulus  (i)  (that  returns  a  ref¬ 
erence  of  type  const  Cmodulusfe),  and  to  the  small  primes  themselves  via  context .  ithPrime  (i) 
(that  returns  a  long).  The  FHEcontext  includes  also  the  various  algebraic  structures  for  plaintext 
arithmetic,  specifically  we  have  the  three  data  members: 

PAlgebra  zMstar;  //  The  structure  of  Z ^ 

PAlgebraModTwo  modTwo;  //  The  structure  of  Z\X\/{$>m(X),  2) 

PAlgebraMod2r  mod2r;  //  The  structure  of  Z[A]/(<Fm(A),  2r) 

In  addition  to  the  above,  the  FHEcontext  contains  a  few  IndexSet  objects,  describing  various 
partitions  of  the  index-set  in  the  vector  of  moduli.  These  partitions  are  used  when  generating  the 
key-switching  matrices  in  the  public  key,  and  when  using  them  to  actually  perform  key-switching 
on  ciphertexts. 

One  such  partition  is  “ciphertext”  vs.  “special”  primes:  Freshly  encrypted  ciphertexts  are 
encrypted  relative  to  a  subset  of  the  small  primes,  called  the  ciphertext  primes.  All  other  primes 
are  only  used  during  key-switching,  these  are  called  the  special  primes.  The  ciphertext  primes,  in 
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turn,  are  sometimes  partitioned  further  into  a  number  of  “digits” ,  corresponding  to  the  columns  in 
our  key-switching  matrices.  (See  the  explanation  of  this  partition  in  Section  3.1.6.)  These  subsets 
are  stored  in  the  following  data  members: 

IndexSet  ctxtPrimes;  //  the  ciphertext  primes 

IndexSet  specialPrimes ;  //  the  "special"  primes 

vector<IndexSet>  digits;  //  digits  of  ctxt/columns  of  key-switching  matrix 

The  FHEcontext  class  provides  also  some  convenience  functions  for  computing  the  product  of  a 
subset  of  small  primes,  as  well  as  the  “size”  of  that  product  (i.e.,  its  logarithm),  via  the  methods: 

ZZ  productOf Primes (const  IndexSet&  s)  const; 

void  productOf Primes (ZZ&  p,  const  IndexSet&  s)  const; 

double  logOf Prime (unsigned  i)  const;  //  =  log(ithPrime (i) ) 
double  logOf Product (const  IndexSet&  s)  const; 

Finally,  the  FHEcontext  module  includes  some  utility  functions  for  adding  moduli  to  the  chain. 
The  method  addPrime  (long  p,  bool  isSpecial)  adds  a  single  prime  p  (either  “special”  or  not), 
after  checking  that  p  has  2m’th  roots  of  unity  and  it  is  not  already  in  the  list.  Then  we  have  three 
higher-level  functions: 

double  AddPrimesBySize (FHEcontextfe  c,  double  size,  bool  special=f alse) ; 

Adds  to  the  chain  primes  whose  product  is  at  least  exp(size),  returns  the  natural  logarithm 
of  the  product  of  all  added  primes. 

double  AddPrimesByNumber (FHEcontext&  c,  long  n,  long  atLeast=l ,  bool  special=f alse) ; 

Adds  n  primes  to  the  chain,  all  at  least  as  large  as  the  atLeast  argument,  returns  the  natural 
logarithm  of  the  product  of  all  added  primes. 

void  buildModChain(FHEcontext&  c,  long  d,  long  t=3) ; 

Build  modulus  chain  for  a  circuit  of  depth  d,  using  t  digits  in  key-switching.  This  function 
puts  d  ciphertext  primes  in  the  moduli  vector,  and  then  as  many  “special”  primes  as  needed 
to  mod-switch  fresh  ciphertexts  (see  Section  3.1.6). 

2.8  DoubleCRT:  Efficient  Polynomial  Arithmetic 

The  heart  of  our  library  is  the  DoubleCRT  class  that  manipulates  polynomials  in  Double-CRT 
representation.  A  DoubleCRT  object  is  tied  to  a  specific  FHEcontext,  and  at  any  given  time  it  is 
defined  relative  to  a  subset  of  small  primes  from  our  list,  S  C  [0, ... ,  context. moduli. size()  —  1]. 
Denoting  the  product  of  these  small  primes  by  q  =  a  DoubleCRT  object  represents  a 

polynomial  a  £  Aq  by  a  matrix  with  cf>(m )  columns  and  one  row  for  each  small  prime  pt  (with  i  £  S). 
The  i’th  row  contains  the  FFT  representation  of  a  modulo  pi ,  i.e.  the  evaluations  {[a((iJ)]Pi  ■  j  £ 
Z*2},  where  Q  is  some  primitive  ra-tli  root  of  unity  modulo  p%. 

Although  the  FHEcontext  must  remain  fixed  throughout,  the  set  S  of  primes  can  change  dy¬ 
namically,  and  so  the  matrix  can  lose  some  rows  and  add  other  ones  as  we  go.  We  thus  keep  these 
rows  in  a  dynamic  IndexMap  data  member,  and  the  current  set  of  indexes  S  is  available  via  the 
method  get  IndexSet  () .  We  provide  the  following  methods  for  changing  the  set  of  primes: 
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void  addPrimes (const  IndexSet&  s) ; 

Expand  the  index  set  by  s.  It  is  assumed  that  s  is  disjoint  from  the  current  index  set.  This  is 
an  expensive  operation,  as  it  needs  to  convert  to  coefficient  representation  and  back,  in  order 
to  determine  the  values  in  the  added  rows. 

double  addPrimesAndScale( const  IndexSetfe  S) ; 

Expand  the  index  set  by  S,  and  multiply  by  g^ff  =  Flies  Pi-  The  se^  ^  *s  assumed  to  be 
disjoint  from  the  current  index  set.  Returns  log^diff)-  This  operation  is  typically  much  faster 
than  addPrimes,  since  we  can  fill  the  added  rows  with  zeros. 

void  removePrimes (const  IndexSetfe  s) ; 

Remove  the  primes  pt  with  i£s  from  the  current  index  set. 

void  scaleDownToSet (const  IndexSetfe  s,  long  ptxtSpace) ; 

This  is  a  modulus-switching  operation.  Let  A  be  the  set  of  primes  that  are  removed, 
A  =  getlndexSetQ  \  s,  and  c/dift  =  FFeAP*-  This  operation  removes  the  primes  pi,i  £  A, 
scales  down  the  polynomial  by  a  factor  of  q^is,  and  rounds  so  as  to  keep  a  mod  ptxtSpace 
unchanged. 

We  provide  some  conversion  routines  to  convert  polynomials  from  coefficient-representation 
(NTL’s  ZZX  format)  to  DoubleCRT  and  back,  using  the  constructor 

DoubleCRT(const  ZZX&,  const  FHEcontextfe,  const  IndexSet&) ; 

and  the  conversion  function  ZZX  to_ZZX  (const  DoubleCRT&) .  We  also  provide  translation  routines 
between  SingleCRT  and  DoubleCRT. 

We  support  the  usual  set  of  arithmetic  operations  on  DoubleCRT  objects  (e.g.,  addition,  multi¬ 
plication,  etc.),  always  working  in  Aq  for  some  modulus  q.  We  only  implemented  the  “destructive” 
two-argument  version  of  these  operations,  where  one  of  the  input  arguments  is  modified  to  return 
the  result.  These  arithmetic  operations  can  only  be  applied  to  DoubleCRT  objects  relative  to  the 
same  FHEcontext,  else  an  error  is  raised. 

On  the  other  hand,  the  DoubleCRT  class  supports  operations  between  objects  with  different 
IndexSet’s,  offering  two  options  to  resolve  the  differences:  Our  arithmetic  operations  take  a  boolean 
flag  matchlndexSets,  when  the  flag  is  set  to  true  (which  is  the  default),  the  index-set  of  the  result  is 
the  union  of  the  index-sets  of  the  two  arguments.  When  matchIndexSets=/afee  then  the  index-set 
of  the  result  is  the  same  as  the  index-set  of  *this,  i.e.,  the  argument  that  will  contain  the  result 
when  the  operation  ends.  The  option  matchIndexSets=tr«e  is  slower,  since  it  may  require  adding 
primes  to  the  two  arguments.  Below  is  a  list  of  the  arithmetic  routines  that  we  implemented: 


DoubleCRTfe  Negate(const  DoubleCRT&  other);  //  *this  =  -other 
DoubleCRT&  Negate!) ;  //  *this  =  -*this; 


DoubleCRTfe 

DoubleCRT& 

DoubleCRT& 

DoubleCRTfe 


operator+= (const  DoubleCRT  feother) ;  //  Addition 
operator+= (const  ZZX  fepoly) ;  //  expensive 
operator+= (const  ZZ  fenum) ; 
operator+=(long  num) ; 


DoubleCRT&  operator— (const  DoubleCRT  feother);  //  Subtraction 
DoubleCRTfe  operator— (const  ZZX  fepoly);  //  expensive 
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DoubleCRT&  operator-= (const  ZZ  &num) ; 

DoubleCRT&  operator-=(long  num) ; 

//  These  are  the  prefix  versions,  ++dcrt  and  — dcrt . 

DoubleCRTfe  operator++() ; 

DoubleCRTfe  operator — () ; 

//  Postfix  versions  (return  type  is  void,  it  is  offered  just  for  style) 
void  operator++(int) ; 
void  operator — (int) ; 

DoubleCRT&  operator*= (const  DoubleCRT  feother) ;  //  Multiplication 
DoubleCRT&  operator*= (const  ZZX  fepoly) ;  //  expensive 
DoubleCRT&  operator*= (const  ZZ  &num) ; 

DoubleCRT&  operator*=(long  num) ; 

//  Procedural  equivalents,  providing  also  the  matchlndexSets  flag 

void  Add(const  DoubleCRT  feother,  bool  matchIndexSets=true) ; 
void  Sub(const  DoubleCRT  feother,  bool  matchIndexSets=true) ; 
void  Mul(const  DoubleCRT  feother,  bool  matchIndexSets=true) ; 

DoubleCRTfe  operator/= (const  ZZ  &num) ;  //  Division  by  constant 

DoubleCRT&  operator/=(long  num) ; 

void  Exp (long  k) ;  //  Small-exponent  polynomial  exponentiation 

//  Automorphism  F(X)  — >  F(X~k)  (with  gcd(k,m)==l) 
void  automorph(long  k) ; 

DoubleCRTfe  operator»=(long  k) ; 

We  also  provide  methods  for  choosing  at  random  polynomials  in  DoubleCRT  format,  as  follows: 
void  randomize (const  ZZ*  seed=NULL) ; 

Fills  each  row  i  G  getlndexSetQ  with  random  integers  modulo  Pi-  This  procedure  uses  the 
NTL  PRG,  setting  the  seed  to  the  seed  argument  if  it  is  non-NULL,  and  using  the  current 
PRG  state  of  NTL  otherwise. 

void  sampleSmall () ; 

Draws  a  random  polynomial  with  coefficients  —1,0, 1,  and  converts  it  to  DoubleCRT  format. 
Each  coefficient  is  chosen  as  0  with  probability  1/2,  and  as  ±1  with  probability  1/4  each. 

void  sampleHWt (long  weight); 

Draws  a  random  polynomial  with  coefficients  —1,0, 1,  and  converts  it  to  DoubleCRT  format. 
The  polynomial  is  chosen  at  random  subject  to  the  condition  that  all  but  weight  of  its 
coefficients  are  zero,  and  the  non-zero  coefficients  are  random  in  ±1. 
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void  sampleGaussian (double  stdev=3.2); 

Draws  a  random  polynomial  with  coefficients  —1,0, 1,  and  converts  it  to  DoubleCRT  format. 
Each  coefficient  is  chosen  at  random  from  a  Gaussian  distribution  with  zero  mean  and  variance 
stdev2,  rounded  to  an  integer. 

In  addition  to  the  above,  we  also  provide  the  following  methods: 

DoubleCRTfe  SetZeroO;  //  set  to  the  constant  zero 

DoubleCRTfe  SetOneO;  //  set  to  the  constant  one 

const  FHEcontextfe  getContextO  const;  //  access  to  context 

const  IndexSet&  getlndexSet ()  const;  //  the  current  set  of  primes 

void  breakIntoDigits(vector<DoubleCRT>&,  long)  const;  //  used  in  key-switching 
The  method  breaklntoDigits  above  is  described  in  Section  3.1.6,  where  we  discuss  key-switching. 

The  SingleCRT  class.  SingleCRT  is  a  helper  class,  used  to  gain  some  efficiency  in  expensive 
DoubleCRT  operations.  A  SingleCRT  object  is  also  defined  relative  to  a  fixed  FHEcontext  and  a 
dynamic  subset  S  of  the  small  primes.  This  SingleCRT  object  holds  an  IndexMap  of  polynomials 
(in  NTL’s  ZZX  format),  where  the  i’th  polynomial  contains  the  coefficients  modulo  the  ith  small 
prime  in  our  list. 

Although  SingleCRT  and  DoubleCRT  objects  can  interact  in  principle,  translation  back  and 
forth  are  expensive  since  they  involve  FFT  (or  inverse  FFT)  modulo  each  of  the  primes.  Hence 
support  for  interaction  between  them  is  limited  to  explicit  conversions. 

3  The  Crypto  Layer 

The  third  layer  of  our  library  contains  the  implementation  of  the  actual  BGV  homomorphic  cryp¬ 
tosystem,  supporting  homomorphic  operations  on  the  “native  plaintext  space”  of  polynomials  in  A2 
(or  more  generally  polynomials  in  A2^  for  some  parameter  r).  We  partitioned  this  layer  (somewhat 
arbitrarily)  into  the  Ctxt  module  that  implements  ciphertexts  and  ciphertext  arithmetic,  the  FHE 
module  that  implements  the  public  and  secret  keys,  and  the  key-switching  matrices,  and  a  helper 
KeySwitching  module  that  implements  some  common  strategies  for  deciding  what  key-switching 
matrices  to  generate.  Two  high-level  design  choices  that  we  made  in  this  layer  is  to  implement 
ciphertexts  as  arbitrary-length  vectors  of  polynomials,  and  to  allow  more  than  one  secret-key  per 
instance  of  the  system.  These  two  choices  are  described  in  more  details  in  Sections  3.1  and  3.2 
below,  respectively. 

3.1  The  Ctxt  module:  Ciphertexts  and  homomorphic  operations 

Recall  that  in  the  BGV  cryptosystem,  a  “canonical”  ciphertext  relative  to  secret  key  s  £  A  is  a 
vector  of  two  polynomials  (co,  Ci)  £  Aq2  (for  the  “current  modulus”  q ),  such  that  m  =  [co  +  Cis]g  is 
a  polynomial  with  small  coefficients,  and  the  plaintext  that  is  encrypted  by  this  ciphertext  is  the 
binary  polynomial  [m]2  E  A2.  However  the  library  has  to  deal  also  with  “non-canonical”  cipher- 
texts:  for  example  when  multiplying  two  ciphertexts  as  above  we  get  a  vector  of  three  polynomials 
(co,  ci,  C2),  which  is  encrypted  by  setting  m  =  [cq  +  ciS  +  C2S2]q  and  outputting  [01)2.  Also,  after  a 
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homomorphic  automorphism  operation  we  get  a  two-polynomial  ciphertext  (co,Ci)  but  relative  to 
the  key  s7  =  n(s)  (where  k  is  the  same  automorphism  that  we  applied  to  the  ciphertext,  namely 
s'(X)  =  s(Xt)  for  some  t  G  Z£J. 

To  support  all  of  these  options,  a  ciphertext  in  our  library  consists  of  an  arbitrary-length  vector 
of  “ciphertext  parts”,  where  each  part  is  a  polynomial,  and  each  part  contains  a  “handle”  that 
points  to  the  secret-key  that  this  part  should  be  multiply  by  during  decryption.  Handles,  parts, 
and  ciphertexts  are  implemented  using  the  classes  SKHandle,  CtxtPart,  and  Ctxt,  respectively. 

3.1.1  The  SKHandle  class 

An  object  of  the  SKHandle  class  “points”  to  one  particular  secret-key  polynomial,  that  should 
multiply  one  ciphertext-part  during  decryption.  Recall  that  we  allow  multiple  secret  keys  per 
instance  of  the  cryptosystem,  and  that  we  may  need  to  reference  powers  of  these  secret  keys  (e.g. 
s2  after  multiplication)  or  polynomials  of  the  form  s(Xt )  (after  automorphism).  The  general  form 
of  these  secret-key  polynomials  is  therefore  s^(Xt),  where  st  is  one  of  the  secret  keys  associated 
with  this  instance,  r  is  the  power  of  that  secret  key,  and  t  is  the  automorphism  that  we  applied  to 
it.  To  uniquely  identify  a  single  secret-key  polynomial  that  should  be  used  upon  decryption,  we 
should  therefore  keep  the  three  integers  (i,r,t). 

Accordingly,  a  SKHandle  object  has  three  integer  data  members,  powerOfS,  powerOfX,  and 
secretKeyID.  It  is  considered  a  reference  to  the  constant  polynomial  1  whenever  powerOfS=  0, 
irrespective  of  the  other  two  values.  Also,  we  say  that  a  SKHandle  object  points  to  a  base  secret 
key  if  it  has  powerOfS  =  powerOfX  =  1. 

Observe  that  when  multiplying  two  ciphertext  parts,  we  get  a  new  ciphertext  part  that  should 
be  multiplied  upon  decryption  by  the  product  of  the  two  secret-key  polynomials.  This  gives 
us  the  following  set  of  rules  for  multiplying  SKHandle  objects  (i.e.,  computing  the  handle  of 
the  resulting  ciphertext-part  after  multiplication).  Let  {powerOfS,  powerOfX,  secretKeyID}  and 
{powerOf S',  powerOfX7,  secretKeyID7}  be  two  handles  to  be  multiplied,  then  we  have  the  following 
rules: 


•  If  one  of  the  SKHandle  objects  points  to  the  constant  1,  then  the  result  is  equal  to  the  other 
one. 


•  If  neither  points  to  one,  then  we  must  have  secretKeyID  =  secretKeyID7  and  powerOfX  = 
powerOfX7,  otherwise  we  cannot  multiply.  If  we  do  have  these  two  equalities,  then  the  result 
will  also  have  the  same  t  =  powerOfX  and  i  =  secretKeyID,  and  it  will  have  r  =  powerOfS  + 
powerOfS7. 

The  methods  provided  by  the  SKHandle  class  are  the  following: 


SKHandle (long  powerS=0,  long  powerX=l,  long  sKeyID=0) ;  //  constructor 
long  getPowerOf S()  const;  //  returns  powerOfS; 

long  getPowerOf X()  const;  //  returns  powerOfX; 

long  getSecretKeylDQ  const;  //  return  secretKeyID; 


void  setBaseO;  //  set  to  point  to  a  base  secret  key 
void  setOneO;  //  set  to  point  to  the  constant  1 
bool  isBaseO  const;  //  does  it  point  to  base? 


14 


16.  Design  and  Implementation  of  a  Homomorphic-Encryption  Library 


bool  isOneQ  const;  //  does  it  point  to  1? 


bool  operator==(const  SKHandlefe  other)  const; 
bool  operator ! =(const  SKHandlefe  other)  const; 

bool  mul(const  SKHandlefe  a,  const  SKHandle&  b) ;  //  multiply  the  handles 

//  result  returned  in  *this,  returns  true  if  handles  can  be  multiplied 

3.1.2  The  CtxtPart  class 

A  ciphertext-part  is  a  polynomial  with  a  handle  (that  “points”  to  a  secret-key  polynomial).  Ac¬ 
cordingly,  the  class  CtxtPart  is  derived  from  DoubleCRT,  and  includes  an  additional  data  member  of 
type  SKHandle.  This  class  does  not  provide  any  methods  beyond  the  ones  that  are  provided  by  the 
base  class  DoubleCRT,  except  for  access  to  the  secret-key  handle  (and  constructors  that  initialize 
it). 

3.1.3  The  Ctxt  class 

A  Ctxt  object  is  always  defined  relative  to  a  fixed  public  key  and  context,  both  must  be  supplied 
to  the  constructor  and  are  fixed  thereafter.  As  described  above,  a  ciphertext  contains  a  vector  of 
parts,  each  part  with  its  own  handle.  This  type  of  representation  is  quite  flexible,  for  example  you 
can  in  principle  add  ciphertexts  that  are  defined  with  respect  to  different  keys,  as  follows: 

•  For  parts  of  the  two  ciphertexts  that  point  to  the  same  secret-key  polynomial  (i.e. ,  have  the 
same  handle),  you  just  add  the  two  DoubleCRT  polynomials. 

•  Parts  in  one  ciphertext  that  do  not  have  counter-part  in  the  other  ciphertext  will  just  be 
included  in  the  result  intact. 

For  example,  suppose  that  you  wanted  to  add  the  following  two  ciphertexts,  one  “canonical”  and 
the  other  after  an  automorphism  X  h >  X 3: 

c  =  (co[*  =  0,  r  =  0,  t  =  0],  ci[*  =  0,  r  =  1,  t  =  1]) 
and  c  '  =  (cg[i  =  0,  r  =  0,  t  =  0],  c'3[i  =  0,  r  =  1,  t  =  3]). 

Adding  these  ciphertexts,  we  obtain  a  three-part  ciphertext, 

c  +  c'  =  ((co  +  Co)[i  =  0,  r  =  0,  t  =  0],  Ci[i  =  0,  r  =  1,  t  =  1],  c':i  [i  =  0,  r  =  1,  t  =  3]). 

Similarly,  we  also  have  flexibility  in  multiplying  ciphertexts  using  a  tensor  product,  as  long  as  all 
the  pairwise  handles  of  all  the  parts  can  be  multiplied  according  to  the  rules  from  Section  3.1.1 
above. 

The  Ctxt  class  therefore  contains  a  data  member  vector<CtxtPart>  parts  that  keeps  all  of 
the  ciphertext-parts.  By  convention,  the  first  part,  parts  [0],  always  has  a  handle  pointing  to 
the  constant  polynomial  1.  Also,  we  maintain  the  invariant  that  all  the  DoubleCRT  objects  in  the 
parts  of  a  ciphertext  are  defined  relative  to  the  same  subset  of  primes,  and  the  IndexSet  for  this 
subset  is  accessible  as  ctxt .  getPrimeSet  () .  (The  current  BGV  modulus  for  this  ciphertext  can 
be  computed  as  q  =  ctxt.getContextQ.productOf Primes(ctxt.getPrimeSet()).) 
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For  reasons  that  will  be  discussed  when  we  describe  modulus-switching  in  Section  3.1.5,  we 
maintain  the  invariant  that  a  ciphertext  relative  to  current  modulus  q  and  plaintext  space  p  has 
an  extra  factor  of  q  mod  p.  Namely,  when  we  decrypt  the  ciphertext  vector  c  using  the  secret-key 
vector  s,  we  compute  m  =  [(s,  c)]p  and  then  output  the  plaintext  m  =  [q^1  ■  m]p  (rather  than  just 
m  =  [m]p).  Note  that  this  has  no  effect  when  the  plaintext  space  is  p  =  2,  since  q~x  mod  p  =  1  in 
this  case. 

The  basic  operations  that  we  can  apply  to  ciphertexts  (beyond  encryption  and  decryption)  are 
addition,  multiplication,  addition  and  multiplication  by  constants,  automorphism,  key-switching 
and  modulus  switching.  These  operations  are  described  in  more  details  later  in  this  section. 


3.1.4  Noise  estimate 


In  addition  to  the  vector  of  parts,  a  ciphertext  contains  also  a  heuristic  estimate  of  the  “noise 
variance”,  kept  in  the  noiseVar  data  member:  Consider  the  polynomial  m  =  [(c,  s}]9  that  we 
obtain  during  decryption  (before  the  reduction  modulo  2).  Thinking  of  the  ciphertext  c  and  the 
secret  key  s  as  random  variables,  this  makes  also  m  a  random  variable.  The  data  members  noiseVar 
is  intended  as  an  estimate  of  the  second  moment  of  the  random  variable  m(rm),  where  rm  =  e2?r*/m 
is  the  principal  complex  primitive  m-th  root  of  unity.  Namely,  we  have  noiseVar  ~  E[|tn(rm)|2]  = 
E[m(rm)  •  tn(rm)],  with  m (rm)  the  complex  conjugate  of  rn(rm). 

The  reason  that  we  keep  an  estimate  of  this  second  moment,  is  that  it  gives  a  convenient  handle 
on  the  1 2  canonical  embedding  norm  of  m,  which  is  how  we  measure  the  noise  magnitude  in  the 
ciphertext.  Heuristically,  the  random  variables  m (ri)  for  all  j  6  behave  as  if  they  are  identically 
distributed,  hence  the  expected  squareds  I2  norm  of  the  canonical  embedding  of  m  is 


m 


I  canon \ 21 


E 

j£^rn 


m(r^)|2  «  4>(m)  ■  E  |m(rm)|2  «  <f>(m) 


noiseVar. 


As  the  I2  norm  of  the  canonical  embedding  of  m  is  larger  by  a  yj (f>(m)  factor  than  the  1 2  norm  of 
its  coefficient  vector,  we  therefore  use  the  condition  \/noiseVar  >  q/2  (with  q  the  current  BGV 
modulus)  as  our  decryption-error  condition.  The  library  never  checks  this  condition  during  the 
computation,  but  it  provides  a  method  ctxt .  isCorrect  ()  that  the  application  can  use  to  check 
for  it.  The  library  does  use  the  noise  estimate  when  deciding  what  primes  to  add/remove  during 
modulus  switching,  see  description  of  the  MultiplyBy  method  below. 


Recalling  that  the  j’th  ciphertext  part  has  a  handle  pointing  to  some  sJ^Xb),  we  have  that 
m  =  [(c,  s)]g  =  cjsj']q-  A  valid  ciphertext  vector  in  the  BGV  cryptosystem  can  always  be 
written  as  c  =  r  +  e  such  that  r  is  some  masking  vector  satisfying  [(r,  s)]q  =  0  and  e  =  (ci ,  C2,  •  •  •) 
is  such  that  ||ej  •  s  -:'  (Xb)||  q.  We  therefore  have  m  =  [(c,  s)]q  =  tjSp ,  and  under  the 
heuristic  assumption  that  the  “error  terms”  c j  are  independent  of  the  keys  we  get  E[|m(rm)|2]  = 

E[lel(rm)|2]  •  E[|Sj(rr£)b-|2]  «  Y,j  E[lel(rm)|2]  •  E[|Sj(rm)rJj2].  The  terms  E[|ej(rm)|2]  depend  on 
the  the  particular  error  polynomials  that  arise  during  the  computation,  and  will  be  described  when 
we  discuss  the  specific  operations.  For  the  secret-key  terms  we  use  the  estimate 


E 


r!  •  Hr, 


where  H  is  the  Hamming- weight  of  the  secret-key  polynomial  s.  For  r  =  1,  it  is  easy  to  see  that 
E[|s(rm)|2]  =  H :  Indeed,  for  every  particular  choice  of  the  H  nonzero  coefficients  of  s,  the  random 
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variable  s (rm)  (defined  over  the  choice  of  each  of  these  coefficients  as  ±1)  is  a  zero-mean  complex 
random  variable  with  variance  exactly  H  (since  it  is  a  sum  of  exactly  H  random  variables,  each 
obtained  by  multiplying  a  uniform  ±1  by  a  complex  constant  of  magnitude  1).  For  r  >  1,  it  is 
clear  that  E[|s(rm)r|2]  >  E[|s(rm)|2]r  =  Hr ,  but  the  factor  of  r!  may  not  be  clear.  We  obtained  that 
factor  experimentally  for  the  most  part,  by  generating  many  polynomials  5  of  some  given  Hamming 
weight  and  checking  the  magnitude  of  s(rm).  Then  we  validated  this  experimental  result  the  case 
r  =  2  (which  is  the  most  common  case  when  using  our  library),  as  described  in  the  appendix.  The 
rules  that  we  use  for  computing  and  updating  the  data  member  noiseVar  during  the  computation, 
as  described  below. 

Encryption.  For  a  fresh  ciphertext,  encrypted  using  the  public  encryption  key,  we  have  noiseVar  = 
cr2(l  +  (j)(m)2/ 2  +  cj>(m)(H  +  1)),  where  a2  is  the  variance  in  our  RLWE  instances,  and  H  is 
the  Hamming  weight  of  the  first  secret  key. 

When  the  plaintext  space  modulus  is  p  >  2,  that  quantity  is  larger  by  a  factor  of  p2.  See 
Section  3.2.2  for  the  reason  for  these  expressions. 

Modulus-switching.  The  noise  magnitude  in  the  ciphertexts  scales  up  as  we  add  primes  to  the 
prime-set,  while  modulus-switching  down  involves  both  scaling  down  and  adding  some  term 
(corresponding  to  the  rounding  errors  for  modulus-switching).  Namely,  when  adding  more 
primes  to  the  prime-set  we  scale  up  the  noise  estimate  as  noiseVar7  =  noiseVar  •  A2,  with 
A  the  product  of  the  added  primes. 

When  removing  primes  from  the  prime-set  we  scale  down  and  add  an  extra  term,  setting 
noiseVar7  =  noiseVar/A2  +  addedNoise,  where  the  added-noise  term  is  computed  as  follows: 
We  go  over  all  the  parts  in  the  ciphertext,  and  consider  their  handles.  For  any  part  j  with  a 
handle  that  points  to  sp  (Xt-i ) ,  where  s j  is  a  secret-key  polynomial  whose  coefficient  vector 
has  Hamming-weight  Hj ,  we  add  a  term  (p2  / 12)  •  <f)(m )  •  {rj)\  ■  Hp .  Namely,  when  modulus¬ 
switching  down  we  set 

v2 

noiseVar7  =  noiseVar/A2  +  —  •  c j>{m)  ■  (?’■/)!  •  Hp . 

3 

See  Section  3.1.5  for  the  reason  for  this  expression. 

Re-linearization/key-switching.  When  key-switching  a  ciphertext,  we  modulus-switch  down  to 
remove  all  the  “special  primes”  from  the  prime-set  of  the  ciphertext  if  needed  (cf.  Section  2.7). 
Then,  the  key-switching  operation  itself  has  the  side-effect  of  adding  these  “special  primes” 
back.  These  two  modulus-switching  operations  have  the  effect  of  scaling  the  noise  down,  then 
back  up,  with  the  added  noise  term  as  above.  Then  add  yet  another  noise  term  as  follows: 

The  key-switching  operation  involves  breaking  the  ciphertext  into  some  number  n'  of  “digits” 
(see  Section  3.1.6).  For  each  digit  i  of  size  D{  and  every  ciphertext-part  that  we  need  to 
switch  (i.e.,  one  that  does  not  already  point  to  1  or  a  base  secret  key),  we  add  a  noise-term 
<; 2  •  p2  ■  D2 / 4,  where  a2  is  the  variance  in  our  RLWE  instances.  Namely,  if  we  need  to 
switch  k  parts  and  if  noiseVar7  is  the  noise  estimate  after  the  modulus-switching  down  and 
up  as  above,  then  our  final  noise  estimate  after  key-switching  is 

n' 

noiseVar77  =  noiseVar7  +  k  ■  (j)(m)cr2  •  p2  ■  D2 / 4 

i=  1 
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where  Di  is  the  size  of  the  i’th  digit.  See  Section  3.1.6  for  more  details. 

addConstant.  We  roughly  add  the  size  of  the  constant  to  our  noise  estimate.  The  calling 
application  can  either  specify  the  size  of  the  constant,  or  else  we  use  the  default  value 
sz  =  i i>(m)  ■  (p/2)2.  Recalling  that  when  current  modulus  is  q  we  need  to  scale  up  the 
constant  by  q  modp,  we  therefore  set  noiseVar7  =  noiseVar  +  (q  modp)2  •  sz. 

multByConstant.  We  multiply  our  noise  estimate  by  the  size  of  the  constant.  Again,  the  calling 
application  can  either  specify  the  size  of  the  constant,  or  else  we  use  the  default  value  sz  = 
( fi(rn )  •  (p/2)2.  Then  we  set  noiseVar7  =  noiseVar  •  sz. 

Addition.  We  first  add  primes  to  the  prime-set  of  the  two  arguments  until  they  are  both  defined 
relative  to  the  same  prime-set  (i.e.  the  union  of  the  prime-sets  of  both  arguments).  Then 
we  just  add  the  noise  estimates  of  the  two  arguments,  namely  noiseVar7  =  noiseVar  + 
other. noiseVar. 

Multiplication.  We  first  remove  primes  from  the  prime-set  of  the  two  arguments  until  they  are 
both  defined  relative  to  the  same  prime-set  (i.e.  the  intersection  of  the  prime-sets  of  both 
arguments).  Then  the  noise  estimate  is  set  to  the  product  of  the  noise  estimates  of  the  two 
arguments,  multiplied  by  an  additional  factor  which  is  computed  as  follows:  Let  r i  be  the 
highest  power  of  s  (i.e.,  the  powerOfS  value)  in  all  the  handles  in  the  first  ciphertext,  and 
similarly  let  r2  be  the  highest  power  of  s  in  all  the  handles  in  the  second  ciphertext,  then  the 
extra  factor  is  (ri+r2).  Namely,  we  have  noiseVar7  =  noiseVar  •  other. noiseVar  •  (ri,^r2)- 
(In  particular  if  the  two  arguments  are  canonical  ciphertexts  then  the  extra  factor  is  (2)  =  2.) 
See  Section  3.1.7  for  more  details. 

Automorphism.  The  noise  estimate  does  not  change  by  an  automorphism  operation. 

3.1.5  Modulus-switching  operations 

Our  library  supports  modulus-switching  operations,  both  adding  and  removing  small  primes  from 
the  current  prime-set  of  a  ciphertext.  In  fact,  our  decision  to  include  an  extra  factor  of  (q  modp) 
in  a  ciphertext  relative  to  current  modulus  q  and  plaintext-space  modulus  p,  is  mainly  intended  to 
somewhat  simplify  these  operations. 

To  add  primes,  we  just  apply  the  operation  addPrimesAndScale  to  all  the  ciphertext  parts 
(which  are  polynomials  in  Double-CRT  format).  This  has  the  effect  of  multiplying  the  ciphertext 
by  the  product  of  the  added  primes,  which  we  denote  here  by  A,  and  we  recall  that  this  operation 
is  relatively  cheap  (as  it  involves  no  FFTs  or  CRTs,  cf.  Section  2.8).  Denote  the  current  modulus 
before  the  modulus-UP  transformation  by  q,  and  the  current  modulus  after  the  transformation  by 
q'  =  q  ■  A.  If  before  the  transformation  we  have  [(c,  s)]g  =  m,  then  after  this  transformation  we 
have  ((?,  s)  =  (A  •  c,  s)  =  A  •  (c,  s),  and  therefore  [(<?,  s)]9.a  =  A  •  m.  This  means  that  if  before  the 
transformation  we  had  by  our  invariant  [(c,  s)]g  =  m  =  q-m  (mod  p ),  then  after  the  transformation 
we  have  [(c,  s)]g/  =  A  •  m  =  q'  •  m  (mod  p),  as  needed. 

For  a  modulus-DOWN  operation  (i.e.,  removing  primes)  from  the  current  modulus  q  to  the 
smallest  modulus  g7,  we  need  to  scale  the  ciphertext  c  down  by  a  factor  of  A  =  q/q'  (thus  getting 
a  fractional  ciphertext),  then  round  appropriately  to  get  back  an  integer  ciphertext.  Using  our 
invariant  about  the  extra  factor  of  ( q  mod  p)  in  a  ciphertext  relative  to  modulus  q  (and  plaintext 
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space  modulus  p).  we  need  to  convert  c  into  another  ciphertext  vector  o'  satisfying  (a)  ( q ')  1c’  = 

q~1c  (mod  p),  and  (b)  the  “rounding  error  term”  e  ==  —  (q'/q)c  is  small.  As  described  in  [5],  we 

apply  the  following  optimized  procedure: 

1.  Let  5  =  c  mod  A, 

2.  Add  or  subtract  multiples  of  A  from  the  coefficients  in  5  until  it  is  divisible  by  p, 

3.  Set  c*  =  c  —  5,  lie"  divisible  by  A,  and  c*  =  c  (mod  p) 

4.  Output  <?  =  c/A. 

An  argument  similar  to  the  proof  of  [2,  Lemma  4]  shows  that  if  before  the  transformation  we 
had  m  =  [(c,  s)]g  =  q  •  m  (mod  p ),  then  after  the  transformation  we  have  m'  =  [(<?,  s)]g/  =  q'  ■  m 
(mod  p),  as  needed.  (The  difference  from  [2,  Lemma  4]  is  that  we  do  not  assume  that  q,  q'  =  1 
(mod  p).) 

Considering  the  noise  magnitude,  we  can  write  =  c/A  +  e  where  e  is  the  rounding  error  (i.e., 
the  terms  that  are  added  in  Step  2  above,  divided  by  A).  The  noise  polynomial  is  thus  scaled  down 
by  a  A  factor,  then  increased  by  the  additive  term  a  =  (e,  s)  =  J2j  ej(X)  ■  sp  (Xb )  (with  a  €  A). 
We  make  the  heuristic  assumption  that  the  coefficients  in  all  the  e^’s  behave  as  if  they  are  chosen 
uniformly  in  the  interval  —  [p/2, p/2).  Under  this  assumption,  we  have 

E  \ej(Tm)\2  =  (j>{m)  -p2/V2, 

since  the  variance  of  a  uniform  random  variable  in  —[p/2, p/2)  is  p2/ 12,  and  ej(rm )  is  a  sum  of 
(j)(m)  such  variables,  scaled  by  different  magnitude- 1  complex  constants.  Assuming  heuristically 
that  the  e/s  are  independent  of  the  public  key,  we  have 

E  |a(rm)|2  =Y1E  \ej(Pm)\2  -E  sp (X b)  ~  X)(^M  •  p2/12)  •  (rj)l  ■  Hp, 

j  1  J  i 

where  p  is  the  plaintext-space  modulus,  Hj  is  the  Hamming  weight  of  the  secret  key  for  the  j’th 
part,  and  rj  is  the  power  of  that  secret  key. 

3.1.6  Key-switching/ re-linearization 

The  re-linearization  operation  ensures  that  all  the  ciphertext  parts  have  handles  that  point  to  either 
the  constant  1  or  a  base  secret-key:  Any  ciphertext  part  j  with  a  handle  pointing  to  sp (Xtj)  with 
either  rj  >  1  or  rj  =  1  and  tj  >  1,  is  replace  by  two  adding  two  parts,  one  that  points  to  1  and 
the  other  than  points  to  s j(X),  using  some  key-switching  matrices  from  the  public  key.  Also,  a 
side-effect  of  re-linearization  is  that  we  add  all  the  “special  primes”  to  the  prime-set  of  the  resulting 
ciphertext. 

To  explain  the  re-linearization  procedure,  we  begin  by  recalling  that  the  “ciphertext  primes” 
that  define  our  moduli-chain  are  partitioned  into  some  number  n  >  1  of  “digits”,  of  roughly  equal 
size.  (For  example,  say  that  we  have  15  small  primes  in  the  chain  and  we  partition  them  to  three 
digits,  then  we  may  take  the  first  five  primes  to  be  the  first  digit,  the  next  five  primes  to  be  the 
second,  and  the  last  five  primes  to  be  the  third.)  The  size  of  a  digit  is  the  product  of  all  the  primes 
that  are  associated  with  it,  and  below  we  denote  by  Dj  the  size  of  the  i’th  digit. 
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When  key-switching  a  ciphertext  c  using  n  >  1  digits,  we  begin  by  breaking  c  into  a  collection 
of  (at  most)  n  lower- norm  ciphertexts  c).  First  we  remove  all  the  special  primes  from  the  prime-set 
of  the  ciphertext  by  modulus-DOWN,  if  needed.  Then  we  determine  the  smallest  n'  such  that  the 
product  of  of  the  first  n'  digits  exceeds  the  current  modulus  q,  and  then  we  set 

1.  d\  :=  c 

2.  For  i  =  1  to  n'  do: 

3.  Ci  :=  di  mod  Di  //  |c)|  <  DJ 2 

4.  di+ 1  :=  ( di  -  Ci)/Di  //  (di  -  c/)  divisible  by  Dt 

Note  that  c  =  yn=1  c)  •  n  y<*  Di,  and  also  since  || c|| oo  <  q/ 2  <  di j-Dj)/2,  then  it  follows  that 
1 1  Ci  | |oo  <  Di/2  for  all  z.  Below  we  assume  that  the  current  modulus  q  is  equal  to  the  product  of  the 
first  n'  digits.  (The  case  where  q  <  JJ.  Di  is  very  similar,  but  requires  somewhat  more  complicated 
notations,  in  that  case  we  just  remove  primes  from  the  last  digit  until  the  product  is  equal  to  q.) 

Consider  now  one  particular  ciphertext-part  c j  in  c,  with  a  handle  that  points  to  some  s'j(X)  = 
5jJ(X^),  with  either  rj  >  1  or  rj  =  1  and  tj  >  1.  Let  us  denote  by  the  ciphertext-parts 

corresponding  to  c j  in  the  low-norm  ciphertexts  c).  That  is,  we  have  c j  =  ci,j  '  Wj<i^ii  and 

also  ||  Cj  j  1 1 oo  <  Di/2  for  all  i.  Moreover  we  need  to  have  in  the  public-key  a  key-switching  matrix 
for  that  handle,  W  =  W [s)  Sy] .  This  W  is  a  2  x  n  matrix  of  polynomials  in  Double-CRT  format, 
defined  relative  to  the  product  of  all  the  small  primes  in  our  chain  (special  or  otherwise).  Below 
we  denote  the  product  of  all  these  small  primes  by  Q.  The  i’th  column  in  the  matrix  encrypts  the 
“plaintext  polynomial”  s)  •  W3<1  D%  under  the  key  s j,  namely  a  vector  (a,;,bi)T  £  Aq  such  that 
[bj  +  ciiSj]Q  =  (Ui<i  Di)  •  s'j+p-  ti,  for  a  small  polynomial  c *  (and  the  plaintext-space  modulus  p ). 
Moreover,  as  long  as  (n7<i  Di) -s'j+p- ti  is  short  enough,  the  same  equality  holds  also  modulo  smaller 
moduli  that  divide  Q.  In  particular,  denoting  the  product  of  the  “special  primes”  by  Q*  and  the 
product  of  the  n'  digits  that  we  use  by  q,  then  for  all  i  <n’  we  have  ||  dl/<i  Di)-s)+p-ti ||oc  <  qQ* /2, 
and  therefore 

[bj  +  a«5j]  =  (]^[  Di)  -  s'j+p  ■  ti . 

j<i 

We  therefore  reduce  the  key-switching  matrix  modulo  qQ* ,  and  add  the  small  primes  corresponding 
to  qQ*  to  all  the  Cjj’s.  We  replace  tj  by  ciphertext-parts  that  point  to  1  and  base,  by  multiplying 
the  n'-vector  dj  =  (cij, . . . ,  cn/j)T  by  (the  first  n'  columns  of)  the  key-switching  matrix  W,  setting 

n' 

[W[l  :  n'}  x  dj\qQ,  =  [^(o*,  b?:)T  •  Cij]qQ*. 

i= 1 

It  is  not  hard  to  see  that  for  these  two  new  ciphertext-parts  we  have: 

n'  n' 

cj  +  cj5j  =  +  aisj )  •  ci,j  =  X/  ((II  Di)  '  S'j  +  P'  e»)  '  chj 

i=  1  i=  1  j<i 

n'  \  n '  n' 

y  (TT  Di)ci,j  1  s'j  +  d  •  y  =  CSJ  +p'}2eici,]  (mod  qQ*) 

i=l  j<i  J  i=  1  i=l 

Replacing  all  the  parts  tj  by  pairs  (c",  c'  )T  as  above  (and  adding  up  all  the  parts  that  point  to  1, 
as  well  as  all  the  parts  that  point  to  the  base  Sj),  we  thus  get  a  canonical  ciphertext  c?  =  (C2,  Ci)T, 
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with  c2  =  c'j]qQ*  and  Cl  =  Ej  c'j]qQ*i  and  we  have 

C2  +  C1S j  =  Cjs'.j  +p(y^,tiCi,j)  =  m  +  p(y^jtiCij)  (mod  qQ*). 

\  3  /  ij  hj 

Hence,  as  long  as  the  additive  term  p(J2i  j  e*c*,j)  is  small  enough,  decrypting  the  new  ciphertext 
yields  the  same  plaintext  value  modulo  p  as  decrypting  the  original  ciphertext  c. 

In  terms  of  noise  magnitude,  we  first  scale  up  the  noise  by  a  factor  of  Q*  when  adding  all  the 
special  primes,  and  then  add  the  extra  noise  term  p  ■  Yli  j  Since  the  qj’s  have  coefficients  of 

magnitude  at  most  Di/2  and  the  polynomials  e,;  are  RLWE  error  terms  with  zero- mean  coefficients 
and  variance  cr2,  then  the  second  moment  of  Cj(rm)  •  c jj(rm)  is  no  more  than  •  Df  / 4.  Thus 

for  every  ciphertext  part  that  we  need  to  switch  (i.e.  that  has  a  handle  that  points  to  something 
other  than  1  or  base),  we  add  a  term  of  c/)(m)a2  ■  p2  •  Df  / 4.  Therefore,  if  our  noise  estimate  after 
the  scale-up  is  noiseVar7  and  we  need  to  switch  k 


n' 

noiseVar77  =  noiseVar7  +  k  ■  (j)(m)a2  ■  p2  ■  D2 / 4 

i=  1 


3.1.7  Native  arithmetic  operations 

The  native  arithmetic  operations  that  can  be  performed  on  ciphertexts  are  negation,  addition/subtraction, 
multiplication,  addition  of  constants,  multiplication  by  constant,  and  automorphism.  In  our  li¬ 
brary  we  expose  to  the  application  both  the  operations  in  their  “raw  form”  without  any  additional 
modulus-  or  key-switching,  as  well  as  some  higher-level  interfaces  for  multiplication  and  automor¬ 
phisms  that  include  also  modulus-  and  key-switching. 

Negation.  The  method  Ctxt:  :  negate  ()  transforms  an  encryption  of  a  polynomial  m  6  Ap  to 
an  encryption  of  [— m]p,  simply  by  negating  all  the  ciphertext  parts  modulo  the  current  modulus. 

(Of  course  this  has  an  effect  on  the  plaintext  only  when  p  >  2.)  The  noise  estimate  is  unaffected. 

Addition/subtraction.  Both  of  these  operations  are  implemented  by  the  single  method 
void  Ctxt :: addCtxt (const  Ctxtfe  other,  bool  negative=f alse) ; 
depending  on  the  negative  boolean  flag.  For  convenience,  we  provide  the  methods  Ctxt :  :  operator+= 
and  Ctxt :  :  operator—  that  call  addCtxt  with  the  appropriate  flag.  A  side  effect  of  this  operation 
is  that  the  prime-set  of  *this  is  set  to  the  union  of  the  prime  sets  of  both  ciphertexts.  After  this 
scaling  (if  needed),  every  ciphertext-part  in  other  that  has  a  matching  part  in  *this  (i.e.  a  part 
with  the  same  handle)  is  added  to  this  matching  part,  and  any  part  in  other  without  a  match  is 
just  appended  to  *this.  We  also  add  the  noise  estimate  of  both  ciphertexts. 

Constant  addition.  Implemented  by  the  methods 

void  Ctxt :: addConstant (const  ZZX&  poly,  double  size=0.0); 
void  Ctxt :: addConstant (const  DoubleCRTfe  poly,  double  size=0.0); 

The  constant  is  scaled  by  a  factor  /  =  (q  mod  p),  with  q  the  current  modulus  and  p  the  ciphertext 
modulus  (to  maintain  our  invariant  that  a  ciphertext  relative  to  q  has  this  extra  factor) ,  then  added 
to  the  part  of  *this  that  points  to  1.  The  calling  application  can  specify  the  size  of  the  added 
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constant  (i.e.  |poly(rm)|2),  or  else  we  use  the  default  value  size  =  c t>{m )  •  {p/2)2,  and  this  value 
(times  /2)  is  added  to  our  noise  estimate. 

Multiplication  by  constant.  Implemented  by  the  methods 

void  Ctxt : :multByConstant (const  ZZX&  poly,  double  size=0.0); 
void  Ctxt : :multByConstant (const  DoubleCRT&  poly,  double  size=0.0); 

All  the  parts  of  *this  are  multiplied  by  the  constant,  and  the  noise  estimate  is  multiplied  by  the 
size  of  the  constant.  As  before,  the  application  can  specify  the  size,  or  else  we  use  the  default  value 
size  =  4>{m)  ■  {p/2)2. 

Multiplication.  “Raw”  multiplication  is  implemented  by 
Ctxt&  Ctxt : : operator*=(const  Ctxt&  other); 

If  needed,  we  modulus-switch  down  to  the  intersection  of  the  prime-sets  of  both  arguments,  then 
compute  the  tensor  product  of  the  two,  namely  the  collection  of  all  pairwise  products  of  parts  from 

*this  and  other. 

The  noise  estimate  of  the  result  is  the  product  of  the  noise  estimates  of  the  two  arguments,  times 
a  factor  which  is  computed  as  follows:  Let  r \  be  the  highest  power  of  s  (i.e.,  the  powerOfS  value) 
in  all  the  handles  in  *this,  and  similarly  let  r2  be  the  highest  power  of  s  in  all  the  handles  other. 
The  extra  factor  is  then  set  as  {ri^r2)-  Namely,  noiseVar'  =  noiseVar  •  other. noiseVar  •  {ri^2). 
The  reason  for  the  (ri+r2)  factor  is  that  the  ciphertext  part  in  the  result,  obtained  by  multiplying 
the  two  parts  with  the  highest  powerOfS  value,  will  have  powerOfS  value  of  the  sum,  r  =  r±  +  r2- 
Recall  from  Section  3.1.4  that  we  estimate  E[|s(rm)r|2]  ~  r!  •  Hr,  where  H  is  the  Hamming  weight 
of  the  coefficient-vector  of  s.  Thus  our  noise  estimate  for  the  relevant  part  in  *this  is  ?q!  ■  Hri  and 
the  estimate  for  the  part  in  other  is  7*2!  •  Hr2.  To  obtain  the  desired  estimate  of  (ri  +  r2)!  •  Hri+r 2, 
we  need  to  multiply  the  product  of  the  estimates  by  the  extra  factor  =  (rirfLr2)-  1 

Higher-level  multiplication.  We  also  provide  the  higher-level  methods 
void  Ctxt : :multiplyBy (const  Ctxt&  other); 

void  Ctxt : :multiplyBy (const  Ctxt&  otherl,  const  Ctxtfe  other2) ; 

The  first  method  multiplies  two  ciphertexts,  it  begins  by  removing  primes  from  the  two  arguments 
down  to  a  level  where  the  rounding-error  from  modulus-switching  is  the  dominating  noise  term  (see 
f  indBaseSet  below),  then  it  calls  the  low-level  routine  to  compute  the  tensor  product,  and  finally 
it  calls  the  reLinearize  method  to  get  back  a  canonical  ciphertext. 

The  second  method  that  multiplies  three  ciphertexts  also  begins  by  removing  primes  from  the 
two  arguments  down  to  a  level  where  the  rounding-error  from  modulus-switching  is  the  dominating 
noise  term.  Based  on  the  prime-sets  of  the  three  ciphertexts  it  chooses  an  order  to  multiply  them 
(so  that  ciphertexts  at  higher  levels  are  multiplied  first).  Then  it  calls  the  tensor-product  routine 
to  multiply  the  three  arguments  in  order,  and  then  re-linearizes  the  result. 

We  also  provide  the  two  convenience  methods  square  and  cube  that  call  the  above  two-argument 
and  three- argument  multiplication  routines,  respectively. 

1Although  products  of  other  pairs  of  parts  may  need  a  smaller  factor,  the  parts  with  highest  powerOfS  value 
represent  the  largest  contribution  to  the  overall  noise,  hence  we  use  this  largest  factor  for  everything. 
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Automorphism.  “Raw”  automorphism  is  implemented  in  the  method 
void  Ctxt :: automorph (long  k) ; 

For  convenience  we  also  provide  Ctxtfe  operator>>=(long  k)  ;  that  does  the  same  thing.  These 
methods  just  apply  the  automorphism  X  H >  Xk  to  every  part  of  the  current  ciphertext,  without 
changing  the  noise  estimate,  and  multiply  by  k  (modulo  m)  the  powerOf  X  value  in  the  handles  of 
all  these  parts. 

“Smart”  Automorphism.  Higher-level  automorphism  is  implemented  in  the  method 
void  Ctxt :: smartAutomorph (long  k) ; 

The  difference  between  automorph  and  smartAutomorph  is  that  the  latter  ensures  that  the  result 
can  be  re-linearized  using  key-switching  matrices  from  the  public  key.  Specifically,  smartAutomorph 
breaks  the  automorphism  X  >  Xk  into  some  number  t  >  1  of  steps,  X  H >  Xki  for  i  =  1,  2, ...  t, 
such  that  the  public  key  contains  key-switching  matrices  for  re-linearizing  all  these  steps  (i.e. 
W  =  W[s(Xki )  =>  s(A)]),  and  at  the  same  time  we  have  n*=i  =  k  (mod  m).  The  method 
smartAutomorph  then  begin  by  re-linearizing  its  argument,  then  in  every  step  it  performs  one  of 
the  automorphisms  X  >  Xki  followed  by  re-linearization. 

The  decision  of  how  to  break  each  exponent  k  into  a  sequence  of  ki  s  as  above  is  done  off  line 
during  key-generation,  as  described  in  Section  3.2.2.  After  this  off-line  computation,  the  public  key 
contains  a  table  that  for  each  k  £  indicates  what  is  the  first  step  to  take  when  implementing  the 
automorphism  X  H >  Xk.  The  smartAutomorph  looks  up  the  first  step  k\  in  that  table,  performs 
the  automorphism  X  H >  Xkl ,  then  compute  k!  =  k/k\  mod  m  and  does  another  lookup  in  the  table 
for  the  first  step  relative  to  k and  so  on. 

3.1.8  More  Ctxt  methods 

The  Ctxt  class  also  provide  the  following  utility  methods: 

void  clear  () ;  Removes  all  the  parts  and  sets  the  noise  estimate  to  zero. 

xdouble  modSwitchAddedNoiseVar  ()  const;  computes  the  added-noise  from  modulus-switching, 
namely  it  returns  ^T(</>(m)  •  J>2/ 12)  •  (rj)\-Hp  where  Hj  and  rj  are  respectively  the  Hamming 
weight  of  the  secret  key  that  the  j’th  ciphertext-part  points  to,  and  the  power  of  that  secret 
key  (i.e.,  the  powerOfS  value  in  the  relevant  handle). 

void  f indBaseSet (IndexSetfe  s)  const;  Returns  in  s  the  largest  prime-set  such  that  modulus- 
switching  to  s  would  make  ctxt  .modSwitchAddedNoiseVar  the  most  significant  noise  term. 
In  other  words,  modulus-switching  to  s  results  in  a  significantly  smaller  noise  than  to  any 
larger  prime-set,  but  modulus-switching  further  down  would  not  reduce  the  noise  by  much. 
When  multiplying  ciphertexts  using  the  multiplyBy  “high-level”  methods,  the  ciphertexts 
are  reduced  to  (the  intersection  of)  their  “base  sets”  levels  before  multiplying. 

long  getLevelO  const;  Returns  the  number  of  primes  in  the  result  of  findBaseSet. 

bool  inCanonicalForm(long  keyID=0)  const;  Returns  true  if  this  is  a  canonical  ciphertexts, 
with  only  two  parts:  one  that  points  to  1  and  the  other  that  points  to  the  “base”  secret  key 
Si(X),  (where  i  =  keyld  is  specified  by  the  caller). 
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bool  isCorrectO  const;  and  double  log_of  .ratio  ()  const;  The  method  isCorrect  ()  com¬ 
pares  the  noise  estimate  to  the  current  modulus,  and  returns  true  if  the  noise  estimate  is 
less  than  half  the  modulus  size.  Specifically,  if  \/noiseVar  <  q/2.  The  method  double 
log.of .ratio ()  returns  log(noiseVar)/2  —  log((7). 

Access  methods.  Read-only  access  the  data  members  of  a  Ctxt  object: 

const  FHEcontext&  getContextO  const; 

const  FHEPubKeyfe  getPubKeyO  const; 

const  IndexSetfe  getPrimeSet ()  const; 

const  xdouble&  getNoiseVar ()  const; 

const  long  getPtxtSpace ()  const;  //  the  plaintext-space  modulus 

const  long  getKeyIDO  const;  //  key-ID  of  the  first  part  not  pointing  to  1 

3.2  The  FHE  module:  Keys  and  key-switching  matrices 

Recall  that  we  made  the  high-level  design  choices  to  allow  instances  of  the  cryptosystem  to  have 
multiple  secret  keys.  This  decision  was  made  to  allow  a  leveled  encryption  system  that  does  not 
rely  on  circular  security,  as  well  as  to  support  switching  to  a  different  key  for  different  purposes 
(which  may  be  needed  for  bootstrapping,  for  example).  However,  we  still  view  using  just  a  single 
secret-key  per  instance  (and  relying  on  circular  security)  as  the  primary  mode  of  using  our  library, 
and  hence  provided  more  facilities  to  support  this  mode  than  for  the  mode  of  using  multiple  keys. 
Regardless  of  how  many  secret  keys  we  have  per  instance,  there  is  always  just  a  single  public 
encryption  key,  for  encryption  relative  to  the  first  secret  key.  (The  public  key  in  our  variant  of 
the  BGV  cryptosystem  is  just  a  ciphertext,  encrypting  the  constant  0.)  In  addition  to  this  public 
encryption  key,  the  public-key  contains  also  key-switching  matrices  and  some  tables  to  help  finding 
the  right  matrices  to  use  in  different  settings.  Ciphertexts  relative  to  secret  keys  other  than  the 
first  (if  any),  can  only  be  generated  using  the  key-switching  matrices  in  the  public  key. 

3.2.1  The  KeySwitch  class 

This  class  implements  key-switching  matrices.  As  we  described  in  Section  3.1.6,  a  key-switching 
matrix  from  s'  to  s,  denoted  W[s'  =>  s],  is  a  2  x  n  matrix  of  polynomials  from  A q,  where  Q  is  the 
product  of  all  the  small  primes  in  our  chain  (both  ciphertext-primes  and  special-primes).  Recall 
that  the  ciphertext  primes  are  partitioned  into  n  digits,  where  we  denote  the  product  of  primes 
corresponding  the  i’th  digit  by  Dj.  Then  the  i’th  column  of  the  matrix  IT  [s'  =>  s]  is  a  pair  of 
elements  (a,;,  bj)  £  A q  that  satisfy 


[bj  +  a,;  •  s]q  —  ( Dj)  ■  s'  +  p  ■  e,;, 

j<i 


for  a  low- norm  polynomial  e,;  £  A q.  In  more  detail,  we  choose  a  low- norm  polynomial  c*  £  A q, 
where  each  coefficient  of  c *  is  chosen  from  a  discrete  Gaussian  over  the  integers  with  variance  a2 
(with  a  a  parameter,  by  default  a  =  3.2).  Then  we  choose  a  random  polynomial  a;  £#  A q  and  set 


bi  := 


[UjciDi)  ■  s'  +  p-  zl  -  Cp  •  5 


JQ 


The  matrix  IT  is  stored  in  a  KeySwitch  object  in  a  space-efficient  manner:  instead  of  storing  the 
random  polynomials  a,;  themselves,  we  only  store  a  seed  for  a  pseudorandom-generator,  from  which 
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all  the  a/s  are  derived.  The  b/s  are  stored  explicitly,  however.  We  note  that  this  space-efficient 
representation  requires  that  we  assume  hardness  of  our  ring-LWE  instances  even  when  the  seed  for 
generating  the  random  elements  is  known,  but  this  seems  like  a  reasonable  assumption. 

In  our  library,  the  source  secret  key  s'  is  of  the  form  s'  =  s^(Xl)  (for  some  index  i!  and 
exponents  r,t),  but  the  target  s  must  be  a  “base”  secret-key,  i.e.  s  =  s i(X)  for  some  index  i.  The 
KeySwitch  object  stores  in  addition  to  the  matrix  W  also  a  secret-key  handle  (r,  t,  i')  that  identifies 
the  source  secret  key,  as  well  as  the  index  i  of  the  target  secret  key. 

The  KeySwitch  class  provides  a  method  NumColsO  that  returns  the  number  of  columns  in  the 
matrix  W.  We  maintain  the  invariant  that  all  the  key-switching  matrices  that  are  defined  relative 
to  some  context  have  the  same  number  of  columns,  which  is  also  equal  to  the  number  of  digits  that 
are  specified  in  the  context. 

3.2.2  The  FHEPubKey  class 

An  FHEPubKey  object  is  defined  relative  to  a  fixed  FHEcontext,  which  must  be  supplied  to  the 
constructor  and  cannot  be  changed  later.  An  FHEPubKey  includes  the  public  encryption  key 
(which  is  a  ciphertext  of  type  Ctxt),  a  vector  of  key-switching  matrices  (of  type  KeySwitch),  and 
another  data  structure  (called  keySwitchMap)  that  is  meant  to  help  finding  the  right  key-switching 
matrices  to  use  for  automorphisms  (see  a  more  detailed  description  below).  In  addition,  for  every 
secret  key  in  this  instance,  the  FHEPubKey  object  stores  the  Hamming  weight  of  that  key,  i.e.,  the 
number  of  non-zero  coefficients  of  the  secret-key  polynomial.  (This  last  piece  of  information  is  used 
to  compute  the  estimated  noise  in  a  ciphertext.)  The  FHEPubKey  class  provides  an  encryption 
method,  and  various  methods  to  find  and  access  key-switching  matrices. 

long  Encrypt (Ctxtfe  ciphertxt,  const  ZZX&  plaintxt,  long  ptxtSpace=0)  const;  This  method 
returns  in  ciphertxt  an  encryption  of  the  plaintext  polynomial  plaintxt,  relative  to  the  plaintext- 
space  modulus  given  in  ptxtSpace.  If  the  ptxtSpace  parameter  is  not  specified  then  we  use  the 
plaintext-space  modulus  from  the  public  encryption  key  in  this  FHEPubKey  object,  and  other¬ 
wise  we  use  the  greater  common  divisor  (GCD)  of  the  specified  value  and  the  one  from  the  public 
encryption  key.  The  current-modulus  in  the  new  fresh  ciphertext  is  the  product  of  all  the  ciphertext- 
primes  in  the  context,  which  is  the  same  as  the  current  modulus  in  the  public  encryption  key  in 
this  FHEPubKey  object. 

Let  the  public  encryption  key  in  the  FHEPubKey  object  be  denoted  c*  =  (cg,c^),  let  Qct  be 
the  product  of  all  the  ciphertext  primes  in  the  context,  and  let  p  be  the  plaintext-space  modulus 
(namely  the  GCD  of  the  parameter  ptxtSpace  and  the  plaintext-space  modulus  from  the  public 
encryption  key).  The  Encrypt  method  chooses  a  random  low-norm  polynomial  t  €  A Qa  with 
—  1/0/1  coefficients,  and  low- norm  error  polynomials  eo>  ci  £  A q,  where  each  coefficient  of  e/s  is 
chosen  from  a  discrete  Gaussian  over  the  integers  with  variance  a2  (with  a  a  parameter,  by  default 
o  =  3.2).  We  then  compute  and  return  the  canonical  ciphertext 

c=(c0,Ci)  :=  t-  (co,<a)  +p-  (e0,ei)  +  plaintxt. 

Note  that  since  the  public  encryption  key  satisfies  [eg  +  s  •  c^] Qct  =  p  ■  e*  for  some  low-norm  poly¬ 
nomial  e*,  then  we  have 

[c0  +  S  •  Ci]qc,  =  [t  •  (eg  +  s  •  c*)  +p  •  (c0  +s  •  Cl)  +  plaintxt]^  =  p-  (c0  +  s  •  ci  + 1  •  e*)  +  plaintxt. 
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For  the  noise  estimate  in  the  new  ciphertext,  we  multiply  the  noise  estimate  in  the  public  encryption 
key  by  the  size  of  the  low-norm  t,  and  add  another  term  for  the  expression  a  =  p  ■  (eo  +  s  •  ci)  + 
plaintxt.  Specifically,  the  noise  estimate  in  the  public  encryption  key  is  pubEncrKey.noiseVar  = 

< j>(m)a2p 2,  the  second  moment  of  t(rm)  is  (f>(m)/ 2,  and  the  second  moment  of  a(rm)  is  no  more 
than  p2(  1  +  sigma2<t’(m)(H  +  1))  with  H  the  Hamming  weight  of  the  secret  key  s.  Hence  the  noise 
estimate  in  a  freshly  encrypted  ciphertext  is 

noiseVar  =  p2  •  (l  +  a 2(j)(m)  ■  2  +  H  +  l)). 

The  key-switching  matrices.  An  FHEPubKey  object  keeps  a  list  of  all  the  key-switching 
matrices  that  were  generated  during  key-generation  in  the  data  member  keySwitching  of  type 
vector<KeySwitch>.  As  explained  above,  each  key-switching  matrix  is  of  the  form  W[sf  (X*)  =^ 
Sj(X)],  and  is  identified  by  a  SKHandle  object  that  specifies  (■ r,t,i )  and  another  integer  that  spec¬ 
ifies  the  target  key-ID  j.  The  basic  facility  provided  to  find  a  key-switching  matrix  are  the  two 
equivalent  methods 

const  KeySwitchfe  getKeySWmatrix (const  SKHandlefe  from,  long  toID=0)  const; 
const  KeySwitchfe  getKeySWmatrix (long  fromSPower,  long  fromXPower,  long  fromID=0, 
long  toID=0)  const; 

These  methods  return  either  a  read-only  reference  to  the  requested  matrix  if  it  exists,  or  oth¬ 
erwise  a  reference  to  a  dummy  KeySwitch  object  that  has  toKeylD  =  — 1.  For  convenience  we 
also  prove  the  methods  bool  haveKeySWmatrix  that  only  test  for  existence,  but  do  not  return  the 
actual  matrix.  Another  variant  is  the  method 

const  KeySwitchfe  getAnyKeySWmatrix (const  SKHandle&  from)  const; 

(and  its  counterpart  bool  haveAnyKeySWmatrix)  that  look  for  a  matrix  with  the  given  source 
(r,  t,  i )  and  any  target.  All  these  methods  first  try  to  find  the  requested  matrix  using  the  keySwitchMap 
table  (which  is  described  below),  and  failing  that  they  resort  to  linear  search  through  the  entire  list 
of  matrices. 


The  keySwitchMap  table.  Although  our  library  supports  key-switching  matrices  of  the  general 
form  WisfiX^  =>  Sj(X)],  we  provide  more  facilities  for  finding  matrices  to  re-linearize  after 
automorphism  (i.e.,  matrices  of  the  form  W[5i(Xti)  =>  Sj(A)])  than  for  other  types  of  matrices. 

For  every  secret  key  s*  in  the  current  instance  we  consider  a  graph  Gi  over  the  vertex  set  Z*n, 
where  we  have  an  edge  j  — >  k  if  and  only  if  we  have  a  key-switching  matrix  W[Si{X^k  )  s*(X)]) 

(where  jk~x  is  computed  modulo  m).  We  observe  that  if  the  graph  Gi  has  a  path  t  1  then  we 
can  apply  the  automorphism  X  i— >  Xt  with  re-linearization  to  a  canonical  ciphertext  relative  to  s?; 
as  follows:  Denote  the  path  from  t  to  1  in  the  graph  by 


t  =  k\  — >  &2  •  •  •  kn  =  1. 


We  follow  the  path  one  step  at  a  time,  for  each  step  j  applying  the  automorphism  X  i— >  Xkjkj+1 
and  then  re-linearizing  the  result  using  the  matrix  W[5i{Xkjk^+1)  =>  S*(X)]  from  the  public  key. 

The  data  member  vector<  vector<long>  >  keySwitchMap  encodes  all  these  graphs  Gi  in  a 
way  that  makes  it  easy  to  find  the  sequence  of  operation  needed  to  implement  any  given  auto¬ 
morphism.  For  every  i,  keySwitchMap  [i]  is  a  vector  of  indexes  that  stores  information  about  the 
graph  Gi .  Specifically,  keySwitchMap  [i]  [t]  is  an  index  into  the  vector  of  key-switching  matrices, 
pointing  out  the  first  step  in  the  shortest  path  t  1  in  Gi  (if  any).  In  other  words,  if  1  is  reachable 
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from  t  in  Gi,  then  keySwitchMap  [i]  [t]  is  an  index  k  such  that  keySwitching[A;]  =  W[Si(Xts  1 )  => 
Si(X)]  where  s  is  one  step  closer  to  1  in  Gi  than  t.  In  particular,  if  we  have  in  the  public  key  a 
matrix  FF[s.j(A4)  =4>  s*(X)]  then  keySwitchMap  [i]  [t]  contains  the  index  of  that  matrix.  If  1  is 
not  reachable  from  t  in  Gi,  then  keySwitchMap[i]  [t]  =  —1. 

The  maps  in  keySwitchMap  are  built  using  a  breadth- first  search  on  the  graph  G, ,  by  calling  the 
method  void  setKeySwitchMap(long  keyld=0) ;  This  method  should  be  called  after  all  the  key¬ 
switching  matrices  are  added  to  the  public  key.  If  more  matrices  are  generated  later,  then  it  should 
be  called  again.  Once  keySwitchMap  is  initialized,  it  is  used  by  the  method  Ctxt :  :  smartAutomorph 
as  follows:  to  implement  X  h y  X1  on  a  canonical  ciphertext  relative  to  secret  key  Sj,  we  do  the 
following: 

1.  while  t  /  1 

2.  set  j  =  pubKey.keySwitchMap[i][t]  // matrix  index 

3.  set  matrix  =  pubKey.keySwitch[j]  // the  matrix  itself 

4.  set  k  =  matrix. fromKey.getPowerOfX()  //  the  next  step 

5.  perform  automorphism  X  h >  Xk,  then  re-linearize 

6.  t  =  t  ■  k~l  mod  m  //  Now  we  are  one  step  closer  to  1 

The  operations  in  steps  2,3  above  are  combined  in  the  method 

const  KeySwitchfe  FHEPubKey: :getNextKSWmatrix(long  t,  long  i) ; 

That  is,  on  input  t,i  it  returns  the  matrix  whose  index  in  the  list  is  j  =  keySwitchMap[i][t\.  Also, 
the  convenience  method  bool  FHEPubKey ::  isReachable  (long  t,  long  keyID=0)  const  check 
if  keySwitchMap  [keylD]  [t]  is  defined,  or  it  is  —1  (meaning  that  1  is  not  reachable  from  t  in  the 
graph  GkeyiD). 


3.2.3  The  FHESecKey  class 

The  FHESecKey  class  is  derived  from  FHEPubKey,  and  contains  an  additional  data  member  with  the 
secret  key(s),  vector<DoubleCRT>  sKeys.  It  also  provides  methods  for  key-generation,  decryption, 
and  generation  of  key-switching  matrices,  as  described  next. 

Key-generation.  The  FHESecKey  class  provides  methods  for  either  generating  a  new  secret-key 
polynomial  with  a  specified  Hamming  weight,  or  importing  a  new  secret  key  that  was  generated  by 
the  calling  application.  That  is,  we  have  the  methods: 

long  ImportSecKey (const  DoubleCRT&  sKey,  long  hwt,  long  ptxtSpace=0) ; 
long  GenSecKey (long  hwt,  long  ptxtSpace=0) ; 

For  both  these  methods,  if  the  plaintext-space  modulus  is  unspecified  then  it  is  taken  to  be  the 
default  2r  from  the  context.  The  first  of  these  methods  takes  a  secret  key  that  was  generated  by 
the  application  and  insert  it  into  the  list  of  secret  keys,  keeping  track  of  the  Hamming  weight  of 
the  key  and  the  plaintext  space  modulus  which  is  supposed  to  be  used  with  this  key.  The  second 
method  chooses  a  random  secret  key  polynomial  with  coefficients  —1/0/1  where  exactly  hwt  of 
them  are  non-zero,  then  it  calls  ImportSecKey  to  insert  the  newly  generated  key  into  the  list.  Both 
of  these  methods  return  the  key-ID,  i.e. ,  index  of  the  new  secret  key  in  the  list  of  secret  keys. 
Also,  with  every  new  secret-key  polynomial  a,;,  we  generate  and  store  also  a  key-switching  matrix 
FF[Sj2(A)  =>  5j(X)]  for  re-linearizing  ciphertexts  after  multiplication. 
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The  first  time  that  ImportSecKey  is  called  for  a  specific  instance,  it  also  generates  a  public 
encryption  key  relative  to  this  first  secret  key.  Namely,  for  the  first  secret  key  s  it  chooses  at  random 
a  polynomial  cf  E#  A<gct  (where  Qct  is  the  product  of  all  the  ciphertext  primes)  as  well  as  a  low-norm 
error  polynomial  c*  G  Agct  (with  Gaussian  coefficients),  then  sets  Cq  :=  [ptxtSpace  •  c*  —  s  •  c^]Qct. 

Clearly  the  resulting  pair  (cq,c^)  satisfies  m*  [eg  +  s  •  c^]Qct  =  ptxtSpace  •  e*,  and  the  noise 
estimate  for  this  public  encryption  key  is  noiseVar*  =  E[|m*  (' tm)|2]  =  P2cr2  ■  (f>(m). 

Decryption.  The  decryption  process  is  rather  straightforward.  We  go  over  all  the  ciphertext 
parts  in  the  given  ciphertext,  multiply  each  part  by  the  secret  key  that  this  part  points  to,  and  sum 
the  result  modulo  the  current  BGV  modulus.  Then  we  reduce  the  result  modulo  the  plaintext-space 
modulus,  which  gives  us  the  plaintext.  This  is  implemented  in  the  method 
void  Decrypt (ZZX&  plaintxt,  const  Ctxt  &ciphertxt)  const; 
that  returns  the  result  in  the  plaintxt  argument.  For  debugging  purposes,  we  also  provide  the 
method  void  Decrypt (ZZX&  plaintxt,  const  Ctxt  &ciphertxt,  ZZX&  f)  const, that  returns 
also  the  polynomial  before  reduction  modulo  the  plaintext  space  modulus.  We  stress  that  it  would 
be  insecure  to  use  this  method  in  a  production  system,  it  is  provided  only  for  testing  and  debugging 
purposes. 

Generating  key-switching  matrices.  We  also  provide  an  interface  for  generating  key-switching 
matrices,  using  the  method: 

void  GenKeySWmatrixClong  fromSPower,  long  fromXPower,  long  f romKeyIdx=0 , 

long  toKeyIdx=0,  long  ptxtSpace=0) ; 

This  method  checks  if  the  relevant  key-switching  matrix  already  exists,  and  if  not  then  it  generates 
it  (as  described  in  Section  3.2.1)  and  inserts  into  the  list  keySwitching.  If  left  unspecified,  the 
plaintext  space  defaults  to  2r,  as  defined  by  context  .mod2r. 

Secret-key  encryption.  We  also  provide  a  secret-key  encryption  method,  that  produces  cipher- 
texts  with  a  slightly  smaller  noise  than  the  public- key  encryption  method.  Namely  we  have  the 
method 

long  FHESecKey :: Encrypt (Ctxt  &c,  const  ZZX&  ptxt,  long  ptxtSpace,  long  skldx)  const 
that  encrypts  the  polynomial  ptxt  relative  to  plaintext-space  modulus  ptxtSpace,  and  the  secret 
key  whose  index  is  skldx.  Similarly  to  the  choise  of  the  public  encryption  key,  the  Encrypt 
method  chooses  at  random  a  polynomial  Ci  A Qa  (where  Qct  is  the  product  of  all  the  ciphertext 
primes)  as  well  as  a  low-norm  error  polynomial  e  G  Aqci  (with  Gaussian  coefficients),  then  sets 

def 

Co  :=  [ptxtSpace  •  c  +  ptxt  —  s  •  Ci]qc4.  Clearly  the  resulting  pair  (co,  Ci)  satisfies  m  =  [co  +  s  • 

Cl]Qct  =  ptxtSpace  •  e  +  ptxt,  and  the  noise  estimate  for  this  public  encryption  key  is  noiseVar  ~ 
E[|m(rm)|2]  =  p2a2  ■  <f>(m). 

3.3  The  KeySwitching  module:  What  matrices  to  generate 

This  module  implements  a  few  useful  strategies  for  deciding  what  key-switching  matrices  for  auto¬ 
morphism  to  choose  during  key-generation.  Specifically  we  have  the  following  methods: 
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void  addAllMatrices (FHESecKey&  sKey,  long  keyID=0) ; 

For  i  =  keylD,  generate  key-switching  matrices  W^S^X*)  =>  Si(A)]  for  all  t  €  ZJ^. 

void  addlDMatrices (FHESecKey&  sKey,  long  keyID=0) ; 

For  i  =  keylD,  generate  key-switching  matrices  W[Si(Xge)  =>  Sj(X)]  for  every  generator  g  of 
Z *mj  (2)  with  order  ord(g),  and  every  exponent  0  <  e  <  ord (<y) .  Also  if  the  order  of  g  in  Z*ra  is 
not  the  same  as  its  order  in  Z *m/  (2),  then  generate  also  the  matrices  W[Si{X9  e)  =>-  Sj(X)] 
(cf.  Section  2.4). 

We  note  that  these  matrices  are  enough  to  implement  all  the  automorphisms  that  are  needed 
for  the  data-movement  routines  from  Section  4. 


void  addSomelDMatrices (FHESecKeyfe  sKey,  long  bound=100 , long  keyID=0) ; 

For  i  =  keylD,  we  generate  just  a  subset  of  the  matrices  that  are  generated  by  addlDMatrices, 
so  that  each  of  the  automorphisms  X  t— y  XgC  can  be  implemented  by  at  most  two  steps  (and 
similarly  for  X  h >  X9  e  for  generators  whose  orders  in  Z^  and  Z^/  (2)  are  different).  In 
other  words,  we  ensure  that  the  graph  Gi  (cf.  Section  3.2.2)  has  a  path  of  length  at  most  2 
from  ge  to  1  (and  also  from  g~e  to  1  for  g’s  of  different  orders). 


In  more  details,  if  ord(g)  <  bound  then  we  generate  all  the  matrices  W[Si(Xge)  =>  Si(X )] 
(or  W[Si(X9  e)  =>  Sj(A)])  just  like  in  addlDMatrices.  When  ord(g)  >  bound,  however,  we 
generate  only  0(\J ord(y))  matrices  for  this  generator:  Denoting  Bg  =  |" ^ord (g)~\,  for  every 
0  <  e  <  Bg  let  e'  =  e  •  Bg  mod  to,  then  we  generate  the  matrices  W[Si(X9<!)  s*(A)]  and 


W[Sl{X9e 


Sj(A)].  In  addition,  if  if  g  has  a  different  order  in  Z^  and  Z*n/  (2)  then  we 


generate  also  W  s i(X9 


s  i(X) 


void  addFrbMatrices (FHESecKey&  sKey,  long  keyID=0) ; 

For  i  =  keylD,  generate  key-switching  matrices  W[Si(X2  )  s,  (A)]  for  0  <  e  <  d  where  d  is 
the  order  of  2  in  Z*  . 


4  The  Data-Movement  Layer 

At  the  top  level  of  our  library,  we  provide  some  interfaces  that  allow  the  application  to  manipulate 
arrays  of  plaintext  values  homomorphically.  The  arrays  are  translated  to  plaintext  polynomials  us¬ 
ing  the  encoding/decoding  routines  provided  by  PAIgebraModTwo/PAIgebraMod2r  (cf.  Section  2.5), 
and  then  encrypted  and  manipulated  homomorphically  using  the  lower-level  interfaces  from  the 
crypto  layer. 

4.1  The  classes  EncryptedArray  and  EncryptedArrayMod2r 

These  classes  present  the  plaintext  values  to  the  application  as  either  a  linear  array  (with  as  many 
entries  as  there  are  elements  in  Z^/(2)),  or  as  a  multi-dimensional  array  corresponding  to  the 
structure  of  the  group  Z *mf  (2).  The  difference  between  EncryptedArray  and  EncryptedArrayMod2r 
is  that  the  former  is  used  when  the  plaintext-space  modulus  is  2,  while  the  latter  is  used  when  it  is 
2r  for  some  r  >  1.  Another  difference  between  them  is  that  EncryptedArray  supports  also  plaintext 
values  in  binary  extension  fields  E^,  while  EncryptedArrayMod2r  only  support  integer  plaintext 
values  from  Z2*-.  This  is  reflected  in  the  constructor  for  these  types:  For  EncryptedArray  we  have 
the  constructor 
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EncryptedArray (const  FHEcontextfe  context,  const  GF2X&  G=GF2X(1 , 1) ) ; 

that  takes  as  input  both  the  context  (that  specifies  m)  and  a  binary  polynomial  G  for  the  repre¬ 
sentation  of  ¥2^  (with  n  the  degree  of  G).  The  default  value  for  the  polynomial  is  G(X)  =  X, 
resulting  in  plaintext  values  in  the  base  field  F2  =  Z2  (he.,  individual  bits).  On  the  other  hand,  the 
constructor  for  EncryptedArrayMod2r  is 

EncryptedArrayMod2r (const  FHEcontextfe  context) ; 

that  takes  only  the  context  (specifying  m  and  the  plaintext-space  modulus  2r),  and  the  plaintext 
values  are  always  in  the  base  ring  Z2*-.  In  either  case,  the  constructors  computes  and  store  various 
“masks”,  which  are  polynomials  that  have  l’s  in  some  of  their  plaintext  slots  and  0  in  the  other 
slots.  There  masks  are  used  in  the  implementation  of  the  data  movement  procedures,  as  described 
below. 

The  multi-dimensional  array  view.  This  view  arranges  the  plaintext  slots  in  a  multi-dimensional 
array,  corresponding  to  the  structure  of  Z*ra/  (2).  The  number  of  dimensions  is  the  same  as  the 
number  of  generators  that  we  have  for  Z *mf  (2),  and  the  size  along  the  i’th  dimension  is  the  order 
of  the  i’th  generator. 

Recall  from  Section  2.4  that  each  plaintext  slot  is  represented  by  some  t  £  Z*ra,  such  that  the  set 
of  representatives  T  C  Z*(  has  exactly  one  element  from  each  conjugacy  class  of  Z*n/  (2).  Moreover, 
if  /1, . . . ,  fn  £  T  are  the  generators  of  Z^/  (2)  (with  /,  having  order  ord (/*)),  then  every  t.  £  T  can 
be  written  uniquely  as  t  =  [flj  /,e*]m  with  each  exponent  e*  taken  from  the  range  0  <  ej  <  ord(/j). 
The  generators  are  roughly  arranges  by  their  order  (i.e.,  ord(/,;)  >  ord(/j+i)),  except  that  we  put 
all  the  generators  that  have  the  same  order  in  Z^  and  Z *ml  (2)  before  all  the  generators  that  have 
different  orders  in  the  two  groups. 

Hence  the  multi-dimensional-array  view  of  the  plaintext  slots  will  have  them  arranged  in  a  n- 
dimensional  hypercube,  with  the  size  of  the  i’th  side  being  ord (/*).  Every  entry  in  this  hypercube 
is  indexed  by  some  e  =  (ei,  e2, . . . ,  en),  and  it  contains  the  plaintext  slot  associated  with  the 
representative  t  =  [FL  f?]m  £  T.  (Note  that  the  lexicographic  order  on  the  vectors  e  of  indexes 
induces  a  linear  ordering  on  the  plaintext  slots,  which  is  what  we  use  in  our  linear-array  view 
described  below.)  The  multi-dimensional-array  view  provides  the  following  interfaces: 

long  dimensionO  const;  returns  the  dimensionality  (i.e.,  the  number  of  generators  in  Z*n/ (2)). 

long  sizeOfDimension(long  i) ;  returns  the  size  along  the  i’th  dimension  (i.e.,  ord (/))). 

long  coordinate  (long  i,  long  k)  const;  return  the  i’th  entry  of  the  fc’th  vector  in  lexico¬ 
graphic  order. 

void  rotatelD(Ctxt&  ctxt,  long  i,  long  k)  const; 

This  method  rotates  the  hypercube  by  k  positions  along  the  i’th  dimension,  moving  the 
content  of  the  slot  indexed  by  (ei . . . ,  e*, . . .  en)  to  the  slot  indexed  by  [e.\  . . . ,  +  k, . . .  en), 

addition  modulo  ord (/;).  Note  that  the  argument  k  above  can  be  either  positive  or  negative, 
and  rotating  by  —k  is  the  same  as  rotating  by  ord(/j)  —  k. 

The  rotate  operation  is  closely  related  to  the  “native”  automorphism  operation  of  the  lower- 
level  Ctxt  class.  Indeed,  if  /*  has  the  same  order  in  ZJ!„  as  in  Z *m/  (2)  then  we  just  apply  the 
automorphism  X  1— >  JA  on  the  input  ciphertext  using  ctxt  .smart  Automorph(/*:).  If  /,  has 
different  orders  in  Z)),  and  Z *mf  (2)  then  we  need  to  apply  the  two  automorphisms  X  1— >  X^ 
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„fc— ord(/^) 

and  X  i— >  X*i  and  then  “mix  and  match”  the  two  resulting  ciphertexts  to  pick  from 

each  of  them  the  plaintext  slots  that  did  not  undergo  wraparound  (see  description  of  the 
select  method  below). 

void  shiftlD(Ctxt&  ctxt,  long  i,  long  k)  const; 

This  is  similar  to  rotate  ID,  except  it  implements  a  non-cyclic  shift  with  zero  fill.  Namely, 
for  a  positive  k  >  0,  the  content  of  any  slot  indexed  by  (ei . . . ,  e*, . . .  en)  with  e*  <  ord (/))  —  k 
is  moved  to  the  slot  indexed  by  (ei . . . ,  e*  +  k, . . .  en),  and  all  the  other  slots  are  filled  with 
zeros.  For  a  negative  k  <  0,  the  content  of  any  slot  indexed  by  (ei . . . ,  ej, . . .  en)  with  e*  >  \k\ 
is  moved  to  the  slot  indexed  by  (ei . . . ,  e*  +  k, . . .  en),  and  all  the  other  slots  are  filled  with 
zeros. 

The  operation  is  implemented  by  applying  the  corresponding  automorphism(s),  and  then 
zero-ing  out  the  wraparound  slots  by  multiplying  the  result  by  a  constant  polynomial  that 
has  zero  in  these  slots. 

The  linear  array  view.  This  view  arranges  the  plaintext  slots  in  a  linear  array,  with  as  many 
entries  as  there  are  plaintext  slots  (i.e.,  |Z*n/  (2)  |).  These  entries  are  ordered  according  to  the 
lexicographic  order  on  the  vectors  of  indexes  from  the  multi-dimensional  array  view  above.  In  other 
words,  we  obtain  a  linear  array  simply  by  “opening  up”  the  hypercube  from  above  in  lexicographic 
order.  The  linear-array  view  provides  the  following  interfaces: 

long  size  ()  const ;  returns  the  number  of  entries  in  the  array,  i.e.,  the  number  of  plaintext  slots, 
void  rotate (Ctxt&  ctxt,  long  k)  const; 

Cyclically  rotate  the  linear  array  by  k  positions,  moving  the  content  of  the  j’tlr  slot  (by  the 
lexicographic  order)  to  slot  j  +  k,  addition  modulo  the  number  of  slots.  (Below  we  denote  the 
number  of  slots  by  N.)  Rotation  by  a  negative  number  —  N  <  k  <  0  is  the  same  as  rotation 
by  the  positive  amount  k  +  N. 

The  procedure  for  implementing  this  cyclic  rotation  is  roughly  a  concurrent  version  of  the 
grade-school  addition-with-carry  procedure,  building  on  the  multidimensional  rotations  from 
above.  What  we  need  to  do  is  to  add  k  (modulo  N)  to  the  index  of  each  plaintext  slot, 
all  in  parallel.  To  that  end,  we  think  of  the  indexes  (and  the  rotation  amount  k)  as  they 
are  represented  in  the  lexicographic  order  above.  Namely,  we  identify  k  with  the  vector 
gik)  =  (e  j^"1 , . . . ,  which  is  k' th  in  the  lexicographic  order  (and  similarly  identify  each 
index  j  with  the  j’th  vector  in  that  order).  We  can  now  think  of  rotation  by  k  as  adding  the 
multi-precision  vector  to  all  the  vectors  j  =  0, 1, . . . ,  N  —  1  in  parallel. 

Beginning  with  the  least-significant  digit  in  these  vector,  we  use  rotate-by-en  '  along  the  n’th 
dimension  to  implement  the  operation  of  e$)  =  e$  +  e[f  ■*  mod  ord  (/n)  for  all  j  at  once. 

Moving  to  the  next  digit,  we  now  have  to  add  to  each  e^_ j  either  e'n_1  or  1  +  en_ j ,  depending 
on  whether  or  not  there  was  a  carry  from  the  previous  position.  To  do  that,  we  compute 
two  rotation  amount  along  the  (n  —  1)  ’th  dimension,  by  e„_1  and  1  +  e„_1,  then  use  a  MUX 
operation  to  choose  the  right  rotation  amount  for  every  slot.  Namely,  indexes  j  for  which 

&U1  >  ord(/j)  —  (in'1  (so  we  have  a  carry)  are  taken  from  the  copy  that  was  rotated  by  1  +  e^1, 

(k) 

while  other  indexes  j  are  taken  from  the  copy  that  was  rotated  by  e)1_1. 
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The  MUX  operation  is  implemented  by  preparing  a  constant  polynomial  that  has  l’s  in  the 
slots  corresponding  to  indexes  (ei, . . .  ,en)  with  en  >  ord (_/))  —  eP  and  0’s  in  all  the  other 
slots  (call  this  polynomial  mask),  then  computing  c1’  =  c\  •  mask  +  C2  ■  (1  —  mask),  where  C\,C2 
are  the  two  ciphertexts  generated  by  rotation  along  dimension  n  —  1  by  1  +  e^1  and  e® ir¬ 
respectively. 

We  then  move  to  the  next  digit,  preparing  a  mask  for  those  j’s  for  which  we  have  a  carry  into 

(k)  (k) 

that  position,  then  rotating  by  l  +  e„_2  and  eyn_2  along  the  (n— 2)’nd  dimension  and  using  the 
mask  to  do  the  MUX  between  these  two  ciphertexts.  We  proceed  in  a  similar  manner  until 
the  most  significant  digit.  To  complete  the  description  of  the  algorithm,  note  that  the  mask 
for  processing  the  i’th  digit  is  computed  as  follows:  For  each  index  j,  which  is  represented 
by  the  vector  (e^  . . .  ,eP , . .  .eP),  we  have  mask,;[j]  =  1  if  either  eP  >  ord(/*)  —  e[k\  or 
if  ep  =  ord(fi)  —  eP  —  1  and  maskj_i[j]  =  1  (i.e.  we  had  a  carry  from  position  i  —  1  to 
position  i).  Hence  the  rotation  procedure  works  as  follows: 

Rotate(c,  k): 

0.  Let  (eP , . . . ,  eP)  be  the  fc’th  vector  in  lexicographic  order. 

1.  Mn  :=  all-1  mask  / /  Mn  is  a  polynomial  with  1  in  all  the  slots 

2.  Rotate  c  by  en  '  along  the  n’th  dimension 

3.  For  %  =  n  —  1  down  to  1 

4.  M[  :=  1  in  the  slots  j  with  efp  >  ord(/,;+i )  —  epx ,  0  in  all  the  other  slots 

5.  M"  :=  1  in  the  slots  j  with  =  ord(,/)+i )  —  epl  —  1,0  in  all  the  outer  slots 

6.  Mi  :=  M[  +  M”  ■  Ml+\  / /  The  Fth  mask 

7.  c  :=  rotate  c  by  e)  along  the  V th  dimension 

8.  c  "  :=  rotate  c  by  1  +  eP  along  the  V th  dimension 

9.  c:=c"  ■  Mi  +  c'  ■  {1-  Mi) 

10.  Return  c. 

void  shift (Ctxt&  ctxt,  long  k)  const;  Non-cyclic  shift  of  the  linear  array  by  k  positions, 
with  zero-fill.  For  a  positive  k  >  0,  then  every  slot  j  >  k  gets  the  content  of  slot  j  —  k,  and 
every  slot  j  <  k  gets  zero.  For  a  negative  k  <  0,  every  slot  j  <  N  —  \k\  gets  the  content  of 
slot  j  +  | fc| ,  and  every  slot  j  >  N  —  \k\  gets  zero  (with  N  the  number  of  slots). 

For  k  >  0,  this  procedure  is  implemented  very  similarly  to  the  rotate  procedure  above,  except 
that  in  the  last  iteration  (processing  the  most-significant  digit)  we  replace  the  operation  of 
rotate-by-e)  along  the  l’st  dimension  by  shift-by-e)  ;  along  the  l’st  dimension  (and  similarly 
use  shift-by-(l  +  e^)  rather  than  rotate-by-(l  +  e^)).  For  a  negative  amount  —  N  <  k  <  0, 
we  use  the  same  procedure  upto  the  last  iteration  with  amount  N  +  k,  and  in  the  last 
iteration  use  shift-by-e/  and  shift-by-(l  +  e')  along  the  1st  dimension,  for  the  negative  number 
e'  =  eP  -  ord  (fi). 

Other  operations.  In  addition  to  the  following  rotation  methods,  the  classes  EncryptedArray  and 
EncryptedArrayMod2r  also  provide  convenience  methods  that  handle  both  encoding  and  homomor¬ 
phic  operations  in  one  shot.  The  class  EncryptedArray  uses  type  vector<GF2X>  for  a  plaintext  array 


32 


16.  Design  and  Implementation  of  a  Homomorphic-Encryption  Library 


(since  the  plaintext  values  can  be  elements  in  an  extension  field  F2n),  whereas  class  EncryptedAr- 
rayMod2r  uses  type  vector<long>  for  the  same  purpose  (since  the  plaintext  values  in  this  case  are 
integers).  The  methods  that  are  provided  are  the  following: 

//  Fill  the  array  with  random  plaintext  data 

void  random (vector<GF2X>&  array)  const;  //  EncryptedArray 

void  random (vector<long>&  array)  const;  //  EncryptedArrayMod2r 

//  Encode  the  given  array  in  a  polynomial 

void  encode (ZZX&  ptxt,  const  vector<GF2X>&  array)  const; 

void  encode (ZZX&  ptxt,  const  vector<long>&  array)  const; 

//  Decode  the  given  polynomial  into  an  array  of  plaintext  values 
void  decode (vector<long>&  array,  const  ZZX&  ptxt)  const 
void  decode (vector<GF2X>&  array,  const  ZZX&  ptxt)  const 

//  Multiply  by  the  ciphertext  by  a  polynomial  encoding  the  given  array 
void  multByConst (Ctxtfe  ctxt,  const  vector<GF2X>&  array); 
void  multByConst (Ctxtfe  ctxt,  const  vector<long>&  array); 

//  Add  to  the  ciphertext  a  polynomial  encoding  the  given  array 
void  addConst (Ctxtfe  ctxt,  const  vector<GF2X>&  array) 
void  addConst (Ctxtfe  ctxt,  const  vector<long>&  array) 

//  MUX:  for  p=encode (selector) ,  set  cl  =  cl*p  +  c2*(l-p) 

void  select (Ctxtfe  cl,  const  Ctxtfe  c2,  const  vector<GF2X>&  selector)  const; 
void  select (Ctxtfe  cl,  const  Ctxtfe  c2,  const  vector<long>&  selector)  const; 

//  Encode  the  array  in  a  polynomial,  then  encrypt  it  in  the  ciphertext  c 
void  encrypt (Ctxtfe  c,  const  FHEPubKeyfe  pKey,  const  vector<GF2X>&  array)  const; 

void  encrypt (Ctxtfe  c,  const  FHEPubKeyfe  pKey,  const  vector<long>&  array)  const; 

//  Decrypt  the  ciphertext  c,  then  decode  the  result  into  the  array 

void  decrypt (const  Ctxtfe  c,  const  FHESecKey&  sKey,  vector<GF2X>&  array)  const; 

void  decrypt (const  Ctxtfe  c,  const  FHESecKey&  sKey,  vector<long>&  array)  const; 


5  Using  the  Library 

Below  we  provide  two  examples  of  how  this  library  can  be  used  by  an  application  program.  These 
examples  compute  a  simple  circuit  with  homomorphic  arithmetic  over  either  GF( 28)  (represented 
using  the  AES  polynomial,  G(X)  =  A8  +  X4  +  X3  +  X  +  1),  or  over  Z2s. 
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5.1  Homomorphic  Operations  over  GF( 28) 

/***  Determine  the  parameters  (cf.  [5,  Appendix  C] )  ***/ 
long  ptxtSpace  =  2; 

long  nDigits  =2;  //  #  of  digits/#  of  columns  in  key-switching  matrices 

long  k  =  80;  //  security  parameter 

long  weight  =  64;  //  Hamming  weight  of  secret  keys 

long  lvls  =3;  //  number  of  ciphertext -primes  in  the  modulus  chain 

long  m  =  11441;  //  the  parameter  m,  defining  Zm~*  and  Phi_m(X) 

/***  Setup  the  various  tables,  and  choose  the  keys  ***/ 

FHEcontext  context(m);  //  initialize  a  new  context  for  the  parameter  m 
buildModChain(context ,  lvls,  nDigits);  //  build  the  modulus  chain 

FHESecKey  secretKey (context) ;  //  initialize  a  secret-key  object 

const  FHEPubKey&  publicKey  =  secretKey;  //  use  the  same  object  as  a  public-key 
secretKey. GenSecKey (weight , ptxtSpace) ;  //  draw  a  random  secret  key 

addSomelDMatrices (secretKey) ;  //  compute  some  key-switching  matrices 

//  We  could  also  use  addlDMatrices  instead  of  addSomelDMatrices 

GF2X  G;  //  G  is  the  AES  polynomial,  G(X)=  X~8  +X~4  +X~3  +X  +1 

SetCoeff (G,8) ;  SetCoeff (G,4) ;  SetCoeff (G,3) ;  SetCoeff (G, 1) ;  SetCoeff (G,0) ; 

EncryptedArray  ea(context,  G) ;  //  An  Encrypt edArr ay  object,  encoding  wrt  G 
long  nslots  =  ea.sizeO;  //  number  of  plaintext  slots 

/***  Encrypt  random  arrays  over  GF(2~8)  ***/ 

vector<GF2X>  pO,  pi,  p2,  p3;  //  Choose  random  arrays 

ea.random(pO) ; 

ea.random(pl) ; 

ea.random(p2) ; 

ea.random(p3) ; 

vector<GF2X>  const 1,  const2;  //  two  more  random  "constants" 
ea. random ( const 1) ; 
ea.random(const2) ; 

ZZX  constl_poly,  const2_poly;  //  encode  constants  as  polynomials 
ea. encode(constl_poly ,  constl); 
ea. encode (const2_poly,  const2) ; 

//  Encrypt  the  random  arrays 

Ctxt  cO (publicKey) ,  cl (publicKey) ,  c2 (publicKey) ,  c3 (publicKey) ; 

ea. encrypt (cO,  publicKey,  pO) ; 

ea. encrypt (cl ,  publicKey,  pi); 

ea. encrypt (c2,  publicKey,  p2) ; 

ea. encrypt (c3,  publicKey,  p3) ; 
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/***  Perform  homomorphic  operations  ***/ 

cl .multiplyBy (cO) ;  //  also  does  mod-switching,  key-switching 

cO . addConstant (constl_poly) ; 
c2 .multByConstant (const2_poly) ; 

Ctxt  tmp(cl) ;  //  tmp  =  cl 

long  amt  =  RandomBnd(2*(nslots/2)+l)-(nslots/2) ;  //  in  [-nslots/2 . . nslots/2] 
ea. shift (tmp,  amt);  //  rotate  tmp  by  amt 
c2  +=  tmp;  //  then  add  to  c2 

amt  =  RandomBnd(2*nslots-l)  -  (nslots-1) ;  //  in  [-(nslots-1) . .nslots-1] 

ea.rotate(c2,  amt); 

cl  .negateO  ; 
c3 .multiplyBy (c2) ; 
cO  -=  c3; 

/***  Decrypt  the  results  of  the  computation  ***/ 

ea. decrypt (cO,  secretKey,  ppO) ; 
ea. decrypt (cl,  secretKey,  ppl); 
ea. decrypt (c2,  secretKey,  pp2) ; 
ea. decrypt (c3,  secretKey,  pp3) ; 


5.2  Homomorphic  Operations  over  Z2s 

This  example  is  almost  identical  to  the  previous  one,  except  that  the  FHEcontext  is  initialized 
also  with  the  paremeter  r  =  5,  we  use  EncryptedArrayMod2r  instead  of  Encrypt edArr ay  and 
vector<long>  instead  of  vector<GF2X>,  and  we  do  not  need  the  polynomial  G. 

/***  Determine  the  parameters  (cf.  [5,  Appendix  C] )  ***/ 


long  r  =  5; 

long  ptxtSpace  =  1L  <<  r;  //  plaintext  space  modulo  2~5 

long  nDigits  =2;  //  #  of  digits/#  of  columns  in  key-switching  matrices 

long  k  =  80;  //  security  parameter 

long  weight  =64;  //  Hamming  weight  of  secret  keys 

long  lvls  =  7;  //  number  of  ciphertext -primes  in  the  modulus  chain 

long  m  =  11441;  //  the  parameter  m,  defining  Zm~*  and  Phi_m(X) 

/***  Setup  the  various  tables,  and  choose  the  keys  ***/ 


FHEcontext  context(m,  r) ;  //  initialize 

buildModChain(context ,  lvls,  nDigits); 
FHESecKey  secretKey (context) ; 
const  FHEPubKey&  publicKey  =  secretKey; 
secretKey . GenSecKey (weight .ptxtSpace) ; 

addSomelDMatrices (secretKey) ; 

//  We  could  also  use  addlDMatrices  inste 


a  new  context  for  the  parameters  m,r 
//  build  the  modulus  chain 
//  initialize  a  secret-key  object 
//  use  the  same  object  as  a  public-key 
//  draw  a  random  secret  key 

//  compute  some  key-switching  matrices 
.d  of  addSomelDMatrices 
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Encrypt edArrayMod2r  ea(context) ; 
long  nslots  =  ea.sizeO; 


//  An  EncryptedArrayMod2r  object 
//  number  of  plaintext  slots 


/***  Encrypt  random  arrays  over  Z_{32}  ***/ 

vector<long>  pO,  pi,  p2,  p3;  //  Choose  random  arrays 

ea.random(pO) ; 

ea.random(pl) ; 

ea.random(p2) ; 

ea.random(p3) ; 

vector<long>  const 1,  const2;  //  two  more  random  "constants" 
ea. random ( const 1) ; 
ea.random(const2) ; 

ZZX  constl_poly,  const2_poly;  //  encode  constants  as  polynomials 
ea.encode(constl_poly,  constl); 
ea. encode (const2_poly,  const2) ; 

//  Encrypt  the  random  arrays 

Ctxt  cO (publicKey) ,  cl (publicKey) ,  c2(publicKey) ,  c3(publicKey) ; 

ea. encrypt (cO,  publicKey,  pO) ; 

ea. encrypt (cl ,  publicKey,  pi); 

ea. encrypt (c2,  publicKey,  p2) ; 

ea. encrypt (c3,  publicKey,  p3) ; 

/***  Perform  homomorphic  operations  ***/ 

cl .multiplyBy (cO) ;  //  also  does  mod-switching,  key-switching 

cO . addConstant (constl_poly) ; 
c2 .multByConstant (const2_poly) ; 

Ctxt  tmp(cl) ;  //  tmp  =  cl 

long  amt  =  RandomBnd(2*(nslots/2)+l)-(nslots/2) ;  //  in  [-nslots/2 . . nslots/2] 
ea. shift (tmp,  amt);  //  rotate  tmp  by  amt 
c2  +=  tmp;  //  then  add  to  c2 

amt  =  RandomBnd(2*nslots-l)  -  (nslots-1) ;  //  in  [-(nslots-1) . .nslots-1] 

ea.rotate(c2,  amt); 

cl  .negateO  ; 
c3 .multiplyBy (c2) ; 
cO  -=  c3; 

/***  Decrypt  the  results  of  the  computation  ***/ 

vector<long>  ppO,  ppl,  pp2,  pp3; 
ea. decrypt (cO,  secretKey,  ppO) ; 
ea. decrypt (cl,  secretKey,  ppl); 
ea. decrypt (c2,  secretKey,  pp2) ; 
ea. decrypt (c3,  secretKey,  pp3) ; 
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A  Proof  of  noise-estimate 

Recall  that  we  observed  empirically  that  for  a  random  Hannning-weight-iL  polynomial  s  with 
coefficients  —1/0/1  and  an  integral  power  r  we  have  E[|sr(r)|2r]  ~  r!  •  Hr ,  where  r  is  the  principal 
complex  m-th  root  of  unity,  r  =  e2,r*/m. 

To  simplify  the  proof,  we  analyze  the  case  that  each  coefficient  of  s  is  chosen  uniformly  at 
random  from  —1/0/1,  so  that  the  expected  Hamming  weight  is  H.  Also,  we  assume  that  s  is 
chosen  as  a  degree-(m  —  1)  polynomial  (rather  than  degree  <f{m)  —  1). 

Theorem  1.  Suppose  m,r,H  are  positive  integers,  with  H  <m,  and  let  r  =  e2m^m  £  C.  Suppose 
that  we  choose  fo, .  ■ ■ ,  fm- 1  independently,  where  for  i  =  0, . . .  ,m  —  1,  f,  is  ±1  with  probability 


37 


16.  Design  and  Implementation  of  a  Homomorphic-Encryption  Library 


H/2rn  each  and  0  with  probability  1  —  H/m.  Let  f(X)  =  1  fiX2  ■  Then  for  fixed  r  and 

H,  m  — >  oo,  we  have 

E[|/(r)|2r]  ~  r\Hr . 

In  particular,  for  H  >  2 r2,  we  have 


E[|/(r)|2r]  _  1  <  2T2  2r+1r2 

r\Hr  ~  H  m 


Before  proving  Theorem  1,  we  introduce  some  notation  and  prove  some  technical  results  that 
will  be  useful. 

Recall  the  “falling  factorial”  notation:  for  integers  n,  k  with  0  <  k  <  n,  we  define  n-  = 
Lemma  1.  For  n  >  k2  >  0,  we  have  nk  —  n-  <  k2nk~l . 


Proof.  We  have 

n-  >  (n  —  /c)A:  =  nk  —  knk~l 

The  lemma  follows  by  verifying  that  when  n  >  /c2,  in  the  above  binomial  expansion,  the  sum  of 
every  consecutive  positive/negative  pair  of  terms  in  non- negative.  □ 


+ 


fcV-2  - 


3k-3 


Arn 


+  -  ■ 


Lemma  2.  For  n  >  2 k2  >  0,  we  have  nk  <  2n-. 

Proof.  This  follows  immediately  from  the  previous  lemma.  □ 


Next,  we  recall  the  notion  of  the  Stirling  number  of  the  second  kind ,  which  is  the  number  of 
ways  to  partition  a  set  of  I  objects  into  k  non-empty  subsets,  and  is  denoted  {kj .  We  use  the 
following  standard  result: 


(1) 


Finally,  we  define  M^n  to  be  the  number  of  perfect  matchings  in  the  complete  graph  on  2 n 
vertices,  and  Mn^n  to  be  the  number  of  perfect  matchings  on  the  complete  bipartite  graph  on  two 
sets  of  n  vertices.  The  following  facts  are  easy  to  establish: 


Mntn  =  n\  (2) 

and 

M2n  <  2 nn\.  (3) 

We  now  turn  to  the  proof  of  the  theorem.  We  have 

/(r)2r  =  /(r)7(ry  =  ^  fh  ■  ■  ■  fi2r  ■  r*1  ■  •  •  . 

We  will  extend  the  usual  notion  of  expected  values  to  complex-valued  random  variables:  if  U  and 
V  are  real- valued  random  variables,  then  E [U  +  Ri]  =  E[[7]  +  E[R]i.  The  usual  rules  for  sums  and 
products  of  expectations  work  equally  well.  By  linearity  of  expectation,  we  have 

E [f(r)2r]  =  Y,  E [fh  ■  ■  ■  fi2r }  ■  r41  •  •  •  t*  ■  r~ ^  ■  ■  ■  t"«- .  (4) 
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Here,  each  index  it  runs  over  the  set  {0, . . . ,  m  —  1}.  In  this  sum,  because  of  independence  and  the 
fact  that  any  odd  power  of  /,;  has  expected  value  0,  the  only  terms  that  contribute  a  non-zero  value 
are  those  in  which  each  index  value  occurs  an  even  number  of  times,  in  which  case,  if  there  are  k 
distinct  values  among  ii, . . . ,  *2d  we  have 

E  [fi1---fi2r}  =  (H/m)k. 

We  want  to  regroup  the  terms  in  (4).  To  this  end,  we  introduce  some  notation:  for  an  integer 
t  G  {1, . . . ,  2 r}  define  w(t)  =  1  if  t  <  r,  and  w(t)  =  —  1  if  t  >  r;  for  a  subset  e  C  {1, . . . ,  2r},  define 
w(e)  =  Ylteew(t)-  We  ca,E  w(e)  the  “weight”  of  e.  Then  we  have: 

E[/(r)2r]  =  ( H/m)k  rilw(ei)+'"+J'^(efc).  (5) 

P={e  i,...,efc}  ji,—,jk 

Here,  the  outer  summation  is  over  all  “even”  partitions  P  =  {ei, . . . ,  e*,}  of  the  set  {1, . . . ,  2r},  where 
each  element  of  the  partition  has  an  even  cardinatilty.  The  inner  summation  is  over  all  sequences 
of  indices  ji,  ■  ■  ■  ,jk,  where  each  index  runs  over  the  set  {0, . . . ,  m  —  1},  but  where  no  value  in  the 
sequence  is  repeated  —  the  special  summation  notation  emphasizes  this  restriction. 

Since  |r|  =  1,  it  is  clear  that 


E [/(r)2”]-  (H/m)k  E  Thw{e1)+-+jkW{ek)  <  (H/m)k{mk  -  m*)  (6) 

P={e  i,.~,ek}  ji,— Jk  P={e  i,—,ek} 

Note  that  in  this  inequality  the  inner  sum  on  the  left  is  over  all  sequences  of  indices  j i, . . .  ,jk, 
without  the  restriction  that  the  indices  in  the  sequence  are  unique. 

Our  first  task  is  to  bound  the  sum  on  the  right-hand  side  of  (6).  Observe  that  any  even  partition 
P  =  {ei, . . . ,  ek}  can  be  formed  by  merging  the  edges  of  some  perfect  matching  on  the  complete 
graph  on  vertices  {!,...,  2 r} .  So  we  have 


E 

P={e 


k  k 

m  —  m- 


)  <  ^2  ( H /m)kk2mk  1  (by  Lemma  1) 

P={e  i,...,efc} 

'2 


< 


rn 


E  Hl 


P={e  i,-,ek} 

r 
k 


k=  1 
r 


<  — M2r  E  ^  ^ Hk  (partitions  formed  from  matchings) 

(by  (3)) 

Ei  ^  H-  (by  Lemma  2) 


<— ELK 


m 


< 


k=  1 

j,2<2r+lj,\  r  ^  r  r 


m 


k= 1 


22r+1r! 


m 


Hr  (by  1). 


Combining  this  with  (6),  we  have 


E[/M21-  E  E 


jiui(ei)H - jkw(ek) 


P={e  i,...,efc} 


JivA 


<  r\Hr 


2 r+1r2 


m 


(7) 
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So  now  consider  the  inner  sum  in  (7).  The  weights  w(e\), . . .  ,w(ek)  are  integers  bounded  by  r 
in  absolute  value,  and  r  is  strictly  less  than  m  by  the  assumption  2 r2  <  H  <  m.  If  any  weight, 
say  w{e i),  is  non-zero,  then  Tu’(eb  has  multiplicative  order  dividing  m,  but  not  1,  and  so  the  sum 
TT  riw(ei^  vanishes,  and  hence 


rii«)(ei)H - h jkw(ek)  _  ^  T32w(e2)-{ - f. jkw{ek)\  __  g 


3l,-A  Ji  32, --,3k 

Otherwise,  if  all  the  weights  are  tu(ei), . . . ,  w(ek)  are  zero,  then 


E 

31,  — ,3  k 


rjiw(ei )4 - hjfc«'(efe)  _  mfc 


We  therefore  have 


(i7/m)fc  X  rii^(ei)H - \~jkw(ek)  _  E 


P={e  i,-,efc} 


31, ■■■,3k 


P={e  i,...,ek} 
w(ei)=-=w(ek)=0 


(8) 


Observe  that  any  partition  P  =  {ei,...,efc}  with  tu(ei)  =  •••  =  rc(efc)  =  0  can  be  formed  by 
merging  the  edges  of  some  perfect  matching  on  the  complete  bipartite  graph  with  vertex  sets 
{1, . . .  ,  r}  and  {r  +  1, . . . ,  2 r}.  The  total  number  of  such  matchings  is  r!  (see  (2)).  So  we  have 


r—1 


r\Hr  < 


X  Hk<r\H^  +  rl  Y:\Ah 


P={e  i,...,ek} 
w(ei)=-=w(ek)=0 


k=  1 


r—1 


k= 1 


<  2r!  <  >  Hk  (by  Lemma  2) 


=  2 r\{Hr-Hr~)  (by  (1)) 

<  2 r\r2Hr~l  (by  Lemma  1) 


Combining  this  with  (7)  and  (8)  proves  the  theorem. 
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Using  Homomorphic  Encryption  for 
Large  Scale  Statistical  Analysis 


CURIS  2012 

David  Wu  and  Jacob  Haven 
Advised  by  Professor  Dan  Boneh 


-  Motivation  - 

Cloud-based  solutions  have  become  increasingly  popular  in  the  past  few 
years.  An  example  of  the  cloud-based  model  is  shown  below.  Here,  three  dif¬ 
ferent  hospitals  provide  data  to  the  cloud.  The  cloud  computing  platform  then 
analyzes  and  extracts  useful  information  from  the  data. 


One  of  the  main  concern  with  cloud  computing  has  been  the  privacy  and  confi¬ 
dentiality  of  the  data.  One  solution  is  to  send  the  data  encrypted  to  the  cloud. 
However,  we  still  need  to  support  useful  computations  on  the  encrypted  data. 
Fully  homomorphic  encryption  (FHE)  is  a  way  of  supporting  such  computa¬ 
tions  on  encrypted  data. 

We  note  that  while  other  mechanisms  exist  for  secure  computation,  they  gen¬ 
erally  require  the  different  data  providers  to  exchange  information.  Because 
FHE  schemes  are  public  key  schemes,  FHE  is  much  better  suited  for  the  sce¬ 
nario  where  we  have  many  sources  of  data. 

-  Our  Approach  - 

Due  to  the  significant  overhead  in  homomorphic  computation,  implementa¬ 
tions  of  homomorphic  encryption  schemes  for  statistical  analysis  have  been 
limited  to  small  datasets  («  100  data  points)  and  low  dimensional  data  (~  2-4 
dimensions). 

Using  recent  techniques  in  batched  computation  and  a  different  message  en¬ 
coding  scheme,  we  demonstrate  the  viability  of  using  leveled  homomorphic 
encryption  to  compute  on  datasets  with  over  a  million  elements  as  well  as 
datasets  of  much  higher  dimension. 

In  particular,  we  consider  two  applications  of  homomorphic  encryption:  com¬ 
puting  the  mean  and  covariance  of  multivariate  data  and  performing  linear  re¬ 
gression  over  encrypted  datasets. 


-  Computation  over  Large  Integers  - 

To  support  computation  over  large  amounts  of  data,  we  need  to  be  able  to 
handle  large  integers  (i.e.,  128-bit  precision).  However,  it  is  not  computation¬ 
ally  feasible  to  choose  message  spaces  of  this  magnitude.  To  support  compu¬ 
tations  with  at  least  128-bit  precision,  we  leverage  the  Chinese  Remainder 
Theorem  (CRT): 


Pi 


P  2. 


Datax 

Result! 

Data2 

Result2 

CRT 


#  We  choose  primes  Pi,P2,  ■  ■  ■  ,Pk  such  that  P1P2  •  •  -Pk  >  2128. 

#  We  perform  the  computation  modulo  each  prime.  Given  the  results  of  the 
computation  with  respect  to  each  prime,  we  apply  the  CRT  to  obtain  the 
value  modulo  the  product  of  the  primes  (at  least  128-bit  precision). 

#  The  computations  with  respect  to  each  prime  is  completely  independent  of 
the  computation  with  respect  to  the  other  primes.  As  such,  all  of  the  compu¬ 
tations  are  naturally  parallelizable. 


-  Homomorphic  Encryption  Scheme  (Client  Side)  - 

Leveled  fully  homomorphic  encryption  (FHE)  schemes  supports  addition  and  multiplication  over  ciphertexts. 
Such  schemes  are  capable  of  evaluating  boolean  circuits  with  bounded  depth  (determined  by  the  number  of 
multiplications)  over  ciphertexts,  and  thus,  can  perform  many  computations  over  the  ciphertext. 


Consider  a  scenario  where  Charlie  wants  to  compute  the  inner  product  of  Alice’s  and  Bob’s  data.  Note  that  co- 
variance  computation  and  linear  regression  can  be  expressed  in  terms  of  matrix  products,  which  can  be  viewed 
as  a  series  of  inner  products. 
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Batching  Encryption 


Evaluation 


Decryption 


Clear  blocks  represent  plaintext  blocks  while  striped  blocks  represent  ciphertext  blocks.  The  shading  in  the  ci¬ 
phertext  block  represents  the  amount  of  noise  in  the  ciphertext  (explained  below). 

•  Batching:  Pack  multiple  plaintext  messages  (data  elements)  into  a  single  ciphertext  block.  This  enables 
the  server  to  evaluate  a  single  instruction  on  multiple  data  with  low  overhead. 

•  Encryption:  Encrypt  the  packed  plaintext  blocks  with  the  FHE  public  key  (Charlie’s  public  key). 

•  Evaluation:  Send  the  encrypted  data  to  the  cloud  server  for  processing. 

•  Decryption:  The  client  (Charlie)  decrypts  the  result  using  his  secret  key. 


-  Homomorphic  Encryption  Scheme  (Server  Side)  - 

Our  leveled  FHE  scheme  supports  three  basic  operations:  addition,  multiplication,  and  Frobenius  automor¬ 
phisms.  Below,  we  show  how  we  can  use  these  operations  to  compute  the  inner  product  on  encrypted  data. 


(+)  Element-wise  addition  of  batched  ciphertexts. 

(x)  Element-wise  multiplication  of  batched  ciphertexts. 

(jy\  Automorphism  operation,  which  can  apply  arbitrary  permutations  to  elements  in  the  plaintext  slots.  Used 
here  to  rotate  slots  by  i. 

#  Because  the  individual  plaintext  slots  are  non-interacting,  we  use  a  series  of  automorphisms  to  rotate  the 
slots  and  additions  to  sum  up  the  entries  in  a  batch 


#  In  the  last  step  of  the  circuit,  we  zero  out  the  values  of  the  remaining  slots.  In  the  case  where  the  number 
of  slots  is  not  a  power  of  two,  this  ensures  that  no  additional  information  about  the  data  is  leaked.  The 
result  is  stored  in  the  first  slot  of  the  final  ciphertext. 

#  For  security,  we  must  add  noise  into  ciphertexts  during  the  encryption  process.  Homomorphic  operations 
on  ciphertexts  increase  this  noise.  In  order  to  decrypt  successfully,  the  noise  must  be  below  a  chosen 
threshold. 


#  In  fully  homomorphic  computation,  multiplication  is  substantially  more  expensive  (both  in  terms  of  runtime 
and  amount  of  noise  generated)  than  addition.  We  can  quantify  this  by  defining  the  depth  of  a  circuit  to  be 
the  number  of  multiplications  in  the  circuit.  Evaluating  deeper  circuits  requires  larger  parameters,  and  cor¬ 
respondingly,  longer  runtimes.  A  comparison  of  addition  and  multiplication  runtimes  for  different  circuit 
depths  is  given  below: 


Depth 


Time  to  Perform  Time  to  Perform 

1  Addition  (ms)  1  Multiplication  (s) 


1 

1.94 

0.62 

2 

3.23 

2.14 

5 

10.81 

14.11 

10 

25.44 

102.20 

20 

141.93 

771.55 

Experiments 


Below  are  results  from  timing  tests  illustrating  performance  of  linear  regres¬ 
sion  as  well  as  mean  and  covariance  computation  on  different  datasets.  Run¬ 
ning  times  are  relative  to  one  prime  in  the  CRT  decomposition. 


Timing  Tests  for  Linear  Regression  as  a  Function  of  Data  Dimension 
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Timing  Tests  for  Linear  Regression  as  a  Function  of  Dataset  Size 
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Timing  Tests  for  Mean  and  Covariance  Computation 
as  a  Function  of  Data  Dimension 
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Conclusion 


We  have  constructed  a  scale-invariant  leveled  fully  homomorphic 
encryption  system. 

Using  batching  and  CRT-based  message  encoding,  we  are  able  to  per¬ 
form  large  scale  statistical  analysis  on  millions  of  data  points  and  data  of 
moderate  dimension. 


